Encoding Bypass Anti-Pattern

Severity: High

Summary

Encoding bypass evades security checks via alternate encodings. Occurs when validation happens before decoding/normalization. Encoded payload appears safe but becomes malicious after processing. Bypasses WAFs, input filters, enables XSS and SQL injection.

The Anti-Pattern

Flawed order of operations: Validate then Decode/Normalize. Security checks run on encoded data, application later uses decoded version, re-introducing the vulnerability.

BAD Code Example

# VULNERABLE: Validation happens before Unicode normalization.
import unicodedata

def is_safe_username(username):
    # This check is flawed because it doesn't account for Unicode variants.
    if '<' in username or '>' in username:
        return False
    return True

def create_user_profile(username):
    if not is_safe_username(username):
        raise ValueError("Invalid characters in username.")

    # The application later normalizes the username for display or storage.
    # The full-width less-than sign '＜' (U+FF1C) was not caught by the check.
    # It gets normalized into the standard '<' (U+003C), enabling XSS.
    normalized_username = unicodedata.normalize('NFKC', username)

    # This will render the malicious script tag.
    return f"<div>Welcome, {normalized_username}</div>"

# Attacker's input: '＜script＞alert(1)＜/script＞'
# is_safe_username returns True.
# The normalized output becomes '<div>Welcome, <script>alert(1)</script></div>'

GOOD Code Example

# SECURE: Normalize then validate.
import unicodedata

def is_safe_username(username):
    # This check is now effective because it runs on the canonical form of the input.
    if '<' in username or '>' in username:
        return False
    return True

def create_user_profile(username):
    # First, normalize the input to its canonical form.
    normalized_username = unicodedata.normalize('NFKC', username)

    # Then, perform the security validation on the normalized data.
    if not is_safe_username(normalized_username):
        raise ValueError("Invalid characters in username.")

    # Now it's safe to use the normalized username.
    return f"<div>Welcome, {normalized_username}</div>"

JavaScript/Node.js Examples

BAD:

// VULNERABLE: Validation before URL decoding in path traversal
const express = require('express');
const fs = require('fs');
const path = require('path');

app.get('/file/:filename', (req, res) => {
    const filename = req.params.filename;

    // Check for path traversal - but filename is still encoded
    if (filename.includes('..')) {
        return res.status(400).send('Invalid filename');
    }

    // Express automatically decodes URL parameters
    // Attack: filename = "..%2F..%2Fetc%2Fpasswd"
    // After decoding: "../../etc/passwd" - bypasses the check
    const filePath = path.join('/uploads', filename);
    res.sendFile(filePath);
});

// Attack payload: GET /file/..%252F..%252Fetc%252Fpasswd
// Double encoding: %252F becomes %2F, then becomes /

GOOD:

// SECURE: Decode then validate
const express = require('express');
const fs = require('fs');
const path = require('path');

app.get('/file/:filename', (req, res) => {
    // Express already decoded once, but check for double encoding
    let filename = decodeURIComponent(req.params.filename);

    // Normalize to canonical form
    filename = path.normalize(filename);

    // Now validate the normalized path
    if (filename.includes('..') || path.isAbsolute(filename)) {
        return res.status(400).send('Invalid filename');
    }

    // Safe to use
    const filePath = path.join('/uploads', filename);
    res.sendFile(filePath);
});

Java Examples

BAD:

// VULNERABLE: SQL injection via URL decoding bypass
import java.net.URLDecoder;
import java.sql.*;

public void searchUser(String encodedQuery) {
    // Validate before decoding
    if (encodedQuery.contains("'") || encodedQuery.contains("--")) {
        throw new SecurityException("Invalid characters");
    }

    // Decode after validation
    String query = URLDecoder.decode(encodedQuery, "UTF-8");

    // Attack: encodedQuery = "admin%27%20OR%20%271%27%3D%271"
    // After decode: "admin' OR '1'='1" - bypasses the check
    String sql = "SELECT * FROM users WHERE name = '" + query + "'";
    Statement stmt = connection.createStatement();
    ResultSet rs = stmt.executeQuery(sql);
}

GOOD:

// SECURE: Decode then validate (but use parameterized queries)
import java.net.URLDecoder;
import java.sql.*;
import java.util.regex.Pattern;

public void searchUser(String encodedQuery) {
    // Decode to canonical form first
    String query = URLDecoder.decode(encodedQuery, "UTF-8");

    // Validate the decoded form
    if (!Pattern.matches("^[a-zA-Z0-9_]+$", query)) {
        throw new SecurityException("Invalid characters");
    }

    // Use parameterized query (best practice)
    String sql = "SELECT * FROM users WHERE name = ?";
    PreparedStatement stmt = connection.prepareStatement(sql);
    stmt.setString(1, query);
    ResultSet rs = stmt.executeQuery();
}

Detection

Python:

Validation before unicodedata.normalize()
Input checks before urllib.parse.unquote()
Regex patterns before string normalization
HTML entity validation before html.unescape()

JavaScript/Node.js:

Validation before decodeURIComponent()
Path checks before path.normalize()
Express middleware order (validate before decode)
Input checks before Buffer.from(input, 'base64')

Java:

Validation before URLDecoder.decode()
Security checks before Normalizer.normalize()
Input validation before StringEscapeUtils.unescapeHtml()
Path validation before Paths.get().normalize()

PHP:

Validation before urldecode()
Input checks before html_entity_decode()
Path validation before realpath()

Search Patterns:

Grep: normalize\(|decode\(|unescape\(|URLDecoder|decodeURIComponent
Look for validation logic (if statements, regex) before these functions
Check for double decoding: multiple decode calls in sequence
Review web framework routing (automatic decoding may occur)

Common Encoding Bypass Techniques:

URL encoding: %3c for <, %2e%2e%2f for ../
Double URL encoding: %253c for < (decoded twice)
Unicode variants: ＜ (U+FF1C) for <
HTML entities: < or < for <
Unicode escapes: \u003c for <
Mixed encoding: %u003c or %c0%bc for <
Path traversal: ..%2f, ..%5c, %2e%2e/

Prevention

[ ] Normalize/decode before validation: Always bring data to its simplest, canonical form before performing any security checks on it.
[ ] Use parameterized queries (for SQL) and other safe APIs that handle encoding internally. This is the best defense against injection attacks.
[ ] Enforce strict character encoding for all input (e.g., reject any data that is not valid UTF-8).
[ ] Be aware of implicit decoding performed by your web framework or libraries and ensure your validation logic runs after it.
[ ] Canonicalize paths and URLs before validating them to prevent path traversal attacks.

Testing for Encoding Bypass

Manual Testing:

Test URL encoding: %3cscript%3e, ..%2f..%2f
Test double encoding: %253cscript%253e, ..%252f..%252f
Test Unicode variants: ＜script＞, ．．／
Test HTML entities: <script>, <script>
Test mixed encoding: %u003cscript%u003e
Verify filters catch all encoding variants

Automated Testing:

Static Analysis: Semgrep, CodeQL to detect validation-before-decode patterns
DAST: Burp Suite Intruder with encoding payloads, OWASP ZAP fuzzer
Payload Lists: SecLists encoding bypass payloads
Custom Scripts: Automated encoding variant generation

Example Test:

# Test that validation occurs after normalization
def test_encoding_bypass_prevention():
    # Unicode variant of '<script>'
    malicious_input = "＜script＞alert(1)＜/script＞"

    try:
        create_user_profile(malicious_input)
        assert False, "Should reject encoded malicious input"
    except ValueError:
        pass  # Expected

    # Double URL encoding
    encoded_input = "%253cscript%253e"
    try:
        search_user(encoded_input)
        assert False, "Should reject double-encoded input"
    except SecurityException:
        pass  # Expected

Burp Suite Test:

# Intruder payload positions
GET /file/§..%2f..%2fetc%2fpasswd§

# Payload list (encoding variants)
..%2f..%2f
..%252f..%252f
．．／．．／
%2e%2e%2f%2e%2e%2f

Remediation Steps

Identify decoding operations - Use detection patterns to find decode/normalize functions
Trace data flow - Follow user input from entry to security validation
Check validation order - Verify decode/normalize happens before validation
Reverse order if needed - Move normalization before security checks
Add missing normalization - Insert decode/normalize if absent
Test with encoding variants - Use payload list from Testing section
Verify canonicalization - Ensure all paths/URLs are normalized
Review framework behavior - Check for automatic decoding in web framework

Related Security Patterns & Anti-Patterns

SQL Injection Anti-Pattern: A common goal of encoding bypass attacks.
Cross-Site Scripting (XSS) Anti-Pattern: Often enabled by bypassing filters with encoded payloads.
Path Traversal Anti-Pattern: Can be achieved by using encoded representations of ../.
Unicode Security Anti-Pattern: A collection of issues related to handling Unicode securely.

References

OWASP Top 10 A05:2025 - Injection
OWASP GenAI LLM05:2025 - Improper Output Handling
OWASP API Security API8:2023 - Security Misconfiguration
OWASP Testing for HTTP Incoming Requests
CWE-838: Inappropriate Encoding for Output Context
CAPEC-267: Leverage Alternate Encoding
Source: sec-context

Encoding Bypass Anti-Pattern

Severity: High

Summary

The Anti-Pattern

Flawed order of operations: Validate then Decode/Normalize. Security checks run on encoded data, application later uses decoded version, re-introducing the vulnerability.

BAD Code Example

# VULNERABLE: Validation happens before Unicode normalization.
import unicodedata

def is_safe_username(username):
    # This check is flawed because it doesn't account for Unicode variants.
    if '<' in username or '>' in username:
        return False
    return True

def create_user_profile(username):
    if not is_safe_username(username):
        raise ValueError("Invalid characters in username.")

    # The application later normalizes the username for display or storage.
    # The full-width less-than sign '＜' (U+FF1C) was not caught by the check.
    # It gets normalized into the standard '<' (U+003C), enabling XSS.
    normalized_username = unicodedata.normalize('NFKC', username)

    # This will render the malicious script tag.
    return f"<div>Welcome, {normalized_username}</div>"

# Attacker's input: '＜script＞alert(1)＜/script＞'
# is_safe_username returns True.
# The normalized output becomes '<div>Welcome, <script>alert(1)</script></div>'

GOOD Code Example

# SECURE: Normalize then validate.
import unicodedata

def is_safe_username(username):
    # This check is now effective because it runs on the canonical form of the input.
    if '<' in username or '>' in username:
        return False
    return True

def create_user_profile(username):
    # First, normalize the input to its canonical form.
    normalized_username = unicodedata.normalize('NFKC', username)

    # Then, perform the security validation on the normalized data.
    if not is_safe_username(normalized_username):
        raise ValueError("Invalid characters in username.")

    # Now it's safe to use the normalized username.
    return f"<div>Welcome, {normalized_username}</div>"

JavaScript/Node.js Examples

BAD:

// VULNERABLE: Validation before URL decoding in path traversal
const express = require('express');
const fs = require('fs');
const path = require('path');

app.get('/file/:filename', (req, res) => {
    const filename = req.params.filename;

    // Check for path traversal - but filename is still encoded
    if (filename.includes('..')) {
        return res.status(400).send('Invalid filename');
    }

    // Express automatically decodes URL parameters
    // Attack: filename = "..%2F..%2Fetc%2Fpasswd"
    // After decoding: "../../etc/passwd" - bypasses the check
    const filePath = path.join('/uploads', filename);
    res.sendFile(filePath);
});

// Attack payload: GET /file/..%252F..%252Fetc%252Fpasswd
// Double encoding: %252F becomes %2F, then becomes /

GOOD:

// SECURE: Decode then validate
const express = require('express');
const fs = require('fs');
const path = require('path');

app.get('/file/:filename', (req, res) => {
    // Express already decoded once, but check for double encoding
    let filename = decodeURIComponent(req.params.filename);

    // Normalize to canonical form
    filename = path.normalize(filename);

    // Now validate the normalized path
    if (filename.includes('..') || path.isAbsolute(filename)) {
        return res.status(400).send('Invalid filename');
    }

    // Safe to use
    const filePath = path.join('/uploads', filename);
    res.sendFile(filePath);
});

Java Examples

BAD:

// VULNERABLE: SQL injection via URL decoding bypass
import java.net.URLDecoder;
import java.sql.*;

public void searchUser(String encodedQuery) {
    // Validate before decoding
    if (encodedQuery.contains("'") || encodedQuery.contains("--")) {
        throw new SecurityException("Invalid characters");
    }

    // Decode after validation
    String query = URLDecoder.decode(encodedQuery, "UTF-8");

    // Attack: encodedQuery = "admin%27%20OR%20%271%27%3D%271"
    // After decode: "admin' OR '1'='1" - bypasses the check
    String sql = "SELECT * FROM users WHERE name = '" + query + "'";
    Statement stmt = connection.createStatement();
    ResultSet rs = stmt.executeQuery(sql);
}

GOOD:

// SECURE: Decode then validate (but use parameterized queries)
import java.net.URLDecoder;
import java.sql.*;
import java.util.regex.Pattern;

public void searchUser(String encodedQuery) {
    // Decode to canonical form first
    String query = URLDecoder.decode(encodedQuery, "UTF-8");

    // Validate the decoded form
    if (!Pattern.matches("^[a-zA-Z0-9_]+$", query)) {
        throw new SecurityException("Invalid characters");
    }

    // Use parameterized query (best practice)
    String sql = "SELECT * FROM users WHERE name = ?";
    PreparedStatement stmt = connection.prepareStatement(sql);
    stmt.setString(1, query);
    ResultSet rs = stmt.executeQuery();
}

Detection

Python:

Validation before unicodedata.normalize()
Input checks before urllib.parse.unquote()
Regex patterns before string normalization
HTML entity validation before html.unescape()

JavaScript/Node.js:

Validation before decodeURIComponent()
Path checks before path.normalize()
Express middleware order (validate before decode)
Input checks before Buffer.from(input, 'base64')

Java:

Validation before URLDecoder.decode()
Security checks before Normalizer.normalize()
Input validation before StringEscapeUtils.unescapeHtml()
Path validation before Paths.get().normalize()

PHP:

Validation before urldecode()
Input checks before html_entity_decode()
Path validation before realpath()

Search Patterns:

Grep: normalize\(|decode\(|unescape\(|URLDecoder|decodeURIComponent
Look for validation logic (if statements, regex) before these functions
Check for double decoding: multiple decode calls in sequence
Review web framework routing (automatic decoding may occur)

Common Encoding Bypass Techniques:

URL encoding: %3c for <, %2e%2e%2f for ../
Double URL encoding: %253c for < (decoded twice)
Unicode variants: ＜ (U+FF1C) for <
HTML entities: < or < for <
Unicode escapes: \u003c for <
Mixed encoding: %u003c or %c0%bc for <
Path traversal: ..%2f, ..%5c, %2e%2e/

Prevention

[ ] Normalize/decode before validation: Always bring data to its simplest, canonical form before performing any security checks on it.
[ ] Use parameterized queries (for SQL) and other safe APIs that handle encoding internally. This is the best defense against injection attacks.
[ ] Enforce strict character encoding for all input (e.g., reject any data that is not valid UTF-8).
[ ] Be aware of implicit decoding performed by your web framework or libraries and ensure your validation logic runs after it.
[ ] Canonicalize paths and URLs before validating them to prevent path traversal attacks.

Testing for Encoding Bypass

Manual Testing:

Test URL encoding: %3cscript%3e, ..%2f..%2f
Test double encoding: %253cscript%253e, ..%252f..%252f
Test Unicode variants: ＜script＞, ．．／
Test HTML entities: <script>, <script>
Test mixed encoding: %u003cscript%u003e
Verify filters catch all encoding variants

Automated Testing:

Static Analysis: Semgrep, CodeQL to detect validation-before-decode patterns
DAST: Burp Suite Intruder with encoding payloads, OWASP ZAP fuzzer
Payload Lists: SecLists encoding bypass payloads
Custom Scripts: Automated encoding variant generation

Example Test:

# Test that validation occurs after normalization
def test_encoding_bypass_prevention():
    # Unicode variant of '<script>'
    malicious_input = "＜script＞alert(1)＜/script＞"

    try:
        create_user_profile(malicious_input)
        assert False, "Should reject encoded malicious input"
    except ValueError:
        pass  # Expected

    # Double URL encoding
    encoded_input = "%253cscript%253e"
    try:
        search_user(encoded_input)
        assert False, "Should reject double-encoded input"
    except SecurityException:
        pass  # Expected

Burp Suite Test:

# Intruder payload positions
GET /file/§..%2f..%2fetc%2fpasswd§

# Payload list (encoding variants)
..%2f..%2f
..%252f..%252f
．．／．．／
%2e%2e%2f%2e%2e%2f

Remediation Steps

Identify decoding operations - Use detection patterns to find decode/normalize functions
Trace data flow - Follow user input from entry to security validation
Check validation order - Verify decode/normalize happens before validation
Reverse order if needed - Move normalization before security checks
Add missing normalization - Insert decode/normalize if absent
Test with encoding variants - Use payload list from Testing section
Verify canonicalization - Ensure all paths/URLs are normalized
Review framework behavior - Check for automatic decoding in web framework

Related Security Patterns & Anti-Patterns

SQL Injection Anti-Pattern: A common goal of encoding bypass attacks.
Cross-Site Scripting (XSS) Anti-Pattern: Often enabled by bypassing filters with encoded payloads.
Path Traversal Anti-Pattern: Can be achieved by using encoded representations of ../.
Unicode Security Anti-Pattern: A collection of issues related to handling Unicode securely.

References

OWASP Top 10 A05:2025 - Injection
OWASP GenAI LLM05:2025 - Improper Output Handling
OWASP API Security API8:2023 - Security Misconfiguration
OWASP Testing for HTTP Incoming Requests
CWE-838: Inappropriate Encoding for Output Context
CAPEC-267: Leverage Alternate Encoding
Source: sec-context

Adoption

igbuend/encoding-bypass-anti-pattern

$ install --global

Security Scan Results

SKILL.md

Encoding Bypass Anti-Pattern

Summary

The Anti-Pattern

BAD Code Example

GOOD Code Example

JavaScript/Node.js Examples

Java Examples

Detection

Prevention

Testing for Encoding Bypass

Remediation Steps

Related Security Patterns & Anti-Patterns

References

Related Skills

igbuend/xss-anti-pattern

igbuend/xpath-injection-anti-pattern

igbuend/weak-password-hashing-anti-pattern

igbuend/weak-encryption-anti-pattern

igbuend/encoding-bypass-anti-pattern

$ install --global

Security Scan Results

SKILL.md

Encoding Bypass Anti-Pattern

Summary

The Anti-Pattern

BAD Code Example

GOOD Code Example

JavaScript/Node.js Examples

Java Examples

Detection

Prevention

Testing for Encoding Bypass

Remediation Steps

Related Security Patterns & Anti-Patterns

References

Related Skills

igbuend/xss-anti-pattern

igbuend/xpath-injection-anti-pattern

igbuend/weak-password-hashing-anti-pattern

igbuend/weak-encryption-anti-pattern