Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

Dbochman/utf8-double-encoding-fix

Name: utf8-double-encoding-fix
Author: Dbochman

.claude/skills/utf8-double-encoding-fix/SKILL.md

npx skillsauth add Dbochman/dotfiles utf8-double-encoding-fix

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

UTF-8 Double-Encoding Corruption Fix

Problem

UTF-8 characters become garbled after passing through text processing pipelines. For example, the arrow → (U+2192) becomes Ã¢ÂÂ or similar mojibake sequences.

This happens when UTF-8 bytes are incorrectly interpreted as Latin-1 (ISO-8859-1) and then re-encoded as UTF-8, sometimes multiple times.

Context / Trigger Conditions

Non-ASCII characters (arrows, emojis, accented letters) display incorrectly
Text shows patterns like Ã¢ÂÂ, Ã©, Ã¼ instead of →, é, ü
Corruption appeared after:
- Migration scripts processing markdown/YAML
- Cloudflare Workers handling content
- Text pipelines with mixed encoding handling
- gray-matter parsing YAML frontmatter
Original content was correct (verified in git history or source)

Solution

Step 1: Diagnose the Encoding

Check the raw bytes to understand the corruption level:

with open('corrupted-file.md', 'rb') as f:
    content = f.read()

# Find corrupted section
pos = content.find(b'55 ')  # or other known text near corruption
print("Bytes:", content[pos:pos+20].hex())
print("As UTF-8:", content[pos:pos+20].decode('utf-8', errors='replace'))

Corruption patterns:

c3 a2 c2 86 c2 92 = double-encoded → (one decode needed)
c3 83 c2 a2 c3 82 c2 86 c3 82 c2 92 = triple-encoded → (two decodes needed)

Step 2: Apply the Fix

The fix is to decode as Latin-1 then re-encode as UTF-8, repeating until clean:

import os

files_to_fix = ['file1.md', 'file2.md']

for filepath in files_to_fix:
    with open(filepath, 'rb') as f:
        content = f.read()

    text = content.decode('utf-8')

    # Check if it contains double-encoded UTF-8 (Ã pattern)
    while 'Ã' in text:
        try:
            text = text.encode('latin-1').decode('utf-8')
        except (UnicodeDecodeError, UnicodeEncodeError):
            break  # Can't decode further

    with open(filepath, 'w', encoding='utf-8') as f:
        f.write(text)

    print(f"Fixed: {filepath}")

Step 3: Verify the Fix

with open('fixed-file.md', 'r', encoding='utf-8') as f:
    content = f.read()

# Check for proper arrow character
if '→' in content:
    print("✓ Arrow character restored")
elif 'Ã' in content:
    print("✗ Still corrupted - may need another decode pass")

Verification

After fixing:

The file should display correctly in editors
grep "→" should find the arrows (not grep "Ã")
Any build/precompile process should pass without encoding errors

Example

Before (corrupted):

55 Ã¢ÂÂ 98 Lighthouse, system fonts
Blog LCP 5.6s Ã¢ÂÂ 3.1s (45% faster)

After (fixed):

55 → 98 Lighthouse, system fonts
Blog LCP 5.6s → 3.1s (45% faster)

Notes

Why This Happens

The encoding chain that causes this:

Original: → stored as UTF-8 bytes e2 86 92
Mistake: Code reads bytes as Latin-1 characters: â, †, '
Re-encode: Those Latin-1 characters encoded as UTF-8: c3 a2 c2 86 c2 92
Result: Ã¢ÂÂ when displayed

If this happens twice (triple-encoding), you need two decode passes.

Common Causes

Cloudflare Workers: Missing charset=utf-8 in Content-Type header when sending to GitHub API
gray-matter: Writing YAML without proper string quoting for non-ASCII
Migration scripts: Reading files without specifying encoding, defaulting to system locale
Shell pipelines: Commands that don't preserve UTF-8 (e.g., some sed versions)

Prevention

Always specify encoding='utf-8' when reading/writing files in Python
In Node.js, use fs.readFileSync(path, 'utf-8') explicitly
In Cloudflare Workers, set Content-Type: application/json; charset=utf-8
Quote strings containing non-ASCII in YAML frontmatter
Test with non-ASCII characters in CI to catch encoding issues early

Related Issues

If corruption shows as \xe2\x86\x92 (escaped bytes), it's a different issue - the file was written in binary mode or bytes weren't decoded at all
If corruption shows as ? or �, the data was actually lost (replacement character) and may not be recoverable

References

Python Unicode HOWTO
The Absolute Minimum Every Developer Must Know About Unicode

Dbochman/utf8-double-encoding-fix

.claude/skills/utf8-double-encoding-fix/SKILL.md

Fix UTF-8 double-encoding corruption where special characters like arrows (→, ↔) become garbled sequences like "Ã¢ÂÂ" or "Ã¢Â†Â". Use when: (1) Non-ASCII characters display as mojibake after migration/serialization, (2) Arrows, emojis, or accented characters become Ã-prefixed garbage, (3) Content looks correct in source but corrupted after processing through gray-matter, YAML, or text pipelines. Covers detection via hex inspection and fix via latin-1 decode chain.

1 stars

development

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add Dbochman/dotfiles utf8-double-encoding-fix

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 1:49 AM5.7s1 file scanned

SKILL.md

name:: utf8-double-encoding-fix
description:: |
become garbled sequences like "Ã¢ÂÂ" or "Ã¢Â†Â". Use when:: (1) Non-ASCII
author:: Claude Code
version:: 1.0.0
date:: 2026-01-23

UTF-8 Double-Encoding Corruption Fix

Problem

UTF-8 characters become garbled after passing through text processing pipelines. For example, the arrow → (U+2192) becomes Ã¢ÂÂ or similar mojibake sequences.

This happens when UTF-8 bytes are incorrectly interpreted as Latin-1 (ISO-8859-1) and then re-encoded as UTF-8, sometimes multiple times.

Context / Trigger Conditions

Non-ASCII characters (arrows, emojis, accented letters) display incorrectly
Text shows patterns like Ã¢ÂÂ, Ã©, Ã¼ instead of →, é, ü
Corruption appeared after:
- Migration scripts processing markdown/YAML
- Cloudflare Workers handling content
- Text pipelines with mixed encoding handling
- gray-matter parsing YAML frontmatter
Original content was correct (verified in git history or source)

Solution

Step 1: Diagnose the Encoding

Check the raw bytes to understand the corruption level:

with open('corrupted-file.md', 'rb') as f:
    content = f.read()

# Find corrupted section
pos = content.find(b'55 ')  # or other known text near corruption
print("Bytes:", content[pos:pos+20].hex())
print("As UTF-8:", content[pos:pos+20].decode('utf-8', errors='replace'))

Corruption patterns:

c3 a2 c2 86 c2 92 = double-encoded → (one decode needed)
c3 83 c2 a2 c3 82 c2 86 c3 82 c2 92 = triple-encoded → (two decodes needed)

Step 2: Apply the Fix

The fix is to decode as Latin-1 then re-encode as UTF-8, repeating until clean:

import os

files_to_fix = ['file1.md', 'file2.md']

for filepath in files_to_fix:
    with open(filepath, 'rb') as f:
        content = f.read()

    text = content.decode('utf-8')

    # Check if it contains double-encoded UTF-8 (Ã pattern)
    while 'Ã' in text:
        try:
            text = text.encode('latin-1').decode('utf-8')
        except (UnicodeDecodeError, UnicodeEncodeError):
            break  # Can't decode further

    with open(filepath, 'w', encoding='utf-8') as f:
        f.write(text)

    print(f"Fixed: {filepath}")

Step 3: Verify the Fix

with open('fixed-file.md', 'r', encoding='utf-8') as f:
    content = f.read()

# Check for proper arrow character
if '→' in content:
    print("✓ Arrow character restored")
elif 'Ã' in content:
    print("✗ Still corrupted - may need another decode pass")

Verification

After fixing:

The file should display correctly in editors
grep "→" should find the arrows (not grep "Ã")
Any build/precompile process should pass without encoding errors

Example

Before (corrupted):

55 Ã¢ÂÂ 98 Lighthouse, system fonts
Blog LCP 5.6s Ã¢ÂÂ 3.1s (45% faster)

After (fixed):

55 → 98 Lighthouse, system fonts
Blog LCP 5.6s → 3.1s (45% faster)

Notes

Why This Happens

The encoding chain that causes this:

Original: → stored as UTF-8 bytes e2 86 92
Mistake: Code reads bytes as Latin-1 characters: â, †, '
Re-encode: Those Latin-1 characters encoded as UTF-8: c3 a2 c2 86 c2 92
Result: Ã¢ÂÂ when displayed

If this happens twice (triple-encoding), you need two decode passes.

Common Causes

Cloudflare Workers: Missing charset=utf-8 in Content-Type header when sending to GitHub API
gray-matter: Writing YAML without proper string quoting for non-ASCII
Migration scripts: Reading files without specifying encoding, defaulting to system locale
Shell pipelines: Commands that don't preserve UTF-8 (e.g., some sed versions)

Prevention

Always specify encoding='utf-8' when reading/writing files in Python
In Node.js, use fs.readFileSync(path, 'utf-8') explicitly
In Cloudflare Workers, set Content-Type: application/json; charset=utf-8
Quote strings containing non-ASCII in YAML frontmatter
Test with non-ASCII characters in CI to catch encoding issues early

Related Issues

If corruption shows as \xe2\x86\x92 (escaped bytes), it's a different issue - the file was written in binary mode or bytes weren't decoded at all
If corruption shows as ? or �, the data was actually lost (replacement character) and may not be recoverable

References

Python Unicode HOWTO
The Absolute Minimum Every Developer Must Know About Unicode

Related Skills

Dbochman/reolink-camera

tools

VerifiedTrustedCommunity

Use exact configured Reolink cameras through the local Home Hub for availability and power status, fresh stills, visual commentary, protected Dylan/Julia/household sharing, and reversible spotlight control. Supports trusted owner tasks and explicitly scoped proactive automations; not for Nest or Ring cameras, arbitrary recipients, recordings, account changes, or raw camera APIs.

2SKILL.mdUpdated Jul 28, 2026

Dbochman/reolink-camera

Dbochman/plant-tracker

data-ai

VerifiedTrustedCommunity

Privately manage Dylan and Julia's household plant inventory and care history by physical location, bed, and exact Flower Cam view. Use for confirmed plant onboarding from camera conversations, camera- or bed-filtered inventory, record corrections, individual or whole-bed care, and private filtered exports. Pair with reolink-camera when an owner asks about plants visible in Flower Cam images.

2SKILL.mdUpdated Jul 28, 2026

Dbochman/plant-tracker

Dbochman/reachy-control

testing

VerifiedTrustedCommunity

Inspect and control the physically secured Reachy Mini at Crosstown through ClawBody. Use for requests to check Reachy, look around, express an emotion, play any official emotion or dance preset, speak proactively, mute or unmute its microphone, stop movement, or describe what its camera sees.

2SKILL.mdUpdated Jul 16, 2026

Dbochman/reachy-control

Dbochman/reachy-continuity

tools

VerifiedTrustedCommunity

Handle Reachy/iMessage handoffs, selective durable memory, forgetting, and diagnostics; automatic context comes from the gateway plugin.

2SKILL.mdUpdated Jul 16, 2026

Dbochman/reachy-continuity

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/Dbochman/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/.claude/skills/utf8-double-encoding-fix ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

Dbochman/dotfiles

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT