coding/code-guidelines/SKILL.md
Behavioral guidelines to reduce common LLM coding mistakes, derived from Andrej Karpathy's observations on LLM coding pitfalls (Dec 2025). Use when writing, reviewing, or refactoring code across any language to avoid hidden assumptions, overengineering, orthogonal edits, vague goals, and sycophantic approval of bad requests. Especially relevant in agent-first workflows where errors are subtle conceptual mistakes rather than simple syntax issues.
npx skillsauth add aeondave/malskill code-guidelinesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Five behavioral principles addressing the most common failure modes of LLM-assisted coding, as identified by Andrej Karpathy (December 2025).
"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should."
"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code — implement a bloated construction over 1,000 lines when 100 would do."
"They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task."
— Karpathy, Dec 2025
Context: With LLM agents handling the majority of code in agent-first workflows, errors are no longer simple syntax mistakes but subtle conceptual errors that a slightly sloppy, hasty junior dev might do — harder to spot, higher stakes. Watch the output like a hawk.
Tradeoff: These guidelines bias toward caution over speed. For trivial tasks (obvious one-liners, simple typos), apply judgment — not every change needs the full rigor.
Don't assume. Don't hide confusion. Surface tradeoffs.
Before implementing:
Failure mode: silently picking an interpretation and running 200 lines in the wrong direction.
Minimum code that solves the problem. Nothing speculative.
The test: would a senior engineer say this is overcomplicated? If yes, simplify.
Touch only what you must. Clean up only your own mess.
When editing existing code:
When your changes create orphans:
The test: every changed line should trace directly to the user's request.
Define success criteria. Loop until verified.
"LLMs are exceptionally good at looping until they meet specific goals. Don't tell it what to do, give it success criteria and watch it go. Change your approach from imperative to declarative to get the agents looping longer and gain leverage." — Karpathy
Transform imperative tasks into verifiable goals:
| Imperative (weak) | Declarative goal (strong) | |---|---| | "Add validation" | "Write tests for invalid inputs, then make them pass" | | "Fix the bug" | "Write a test that reproduces it, then make it pass" | | "Refactor X" | "Ensure tests pass before and after" | | "Make it faster" | "Benchmark first; target < 100ms p99; verify no regression" |
For multi-step tasks, state a brief plan before starting:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
Strong success criteria enable independent looping. Weak criteria ("make it work") require constant clarification.
Review your own output with fresh eyes before marking it done.
After generating a solution:
Effective practice: review the output with a fresh context window before finalizing — this catches issues that human review often misses.
| File | When to load |
|---|---|
| references/examples.md | Concrete before/after examples for all five principles |
data-ai
Scoped routing: Linux operator; hosts, sessions, users, services, packages, logs, containers, SSH, network paths, privilege evidence.
development
Offensive methodology for ICS/OT/SCADA environments in authorized industrial penetration testing and red team operations. Use when assessing PLCs, RTUs, HMIs, engineering workstations, historians, or field devices running Modbus, DNP3, EtherNet/IP, S7comm/S7+, Profinet, IEC 60870-5-104, BACnet, or OPC-UA. Covers passive OT network enumeration, protocol-level device interrogation, PLC coil/register read-write attacks, HMI session exploitation, historian and engineering workstation compromise, and safe escalation rules for critical infrastructure scope. Does not cover: general IT network exploitation (network-technique), physical hardware interfaces UART/JTAG/SPI (hardware-technique), wireless sensor network attacks (wireless-technique), RF/SDR signal analysis (hardware-ctf or wireless-technique), or CTF-framed ICS lab tasks (ics-ctf).
tools
Offensive methodology for authorized game security assessments, game client security research, and game-adjacent penetration testing in real-world engagements. Use when assessing game clients for cheating vulnerabilities, testing anti-cheat effectiveness, auditing game server protocols for score manipulation or economic fraud, reverse engineering game DRM or license validation, analyzing game save file protection, or assessing game mod/plugin security. Covers: process memory scanning and manipulation (Cheat Engine methodology), game binary reversing for license and DRM bypass, game network protocol analysis and packet replay, anti-cheat mechanism analysis, save file format reversing and tampering, speed hack and value injection techniques. Does NOT cover: CTF game challenges (game-ctf), game engine source code auditing (web-exploit-technique or vuln-search-technique for the backend), or general binary exploitation (pwn-ctf or reversing-technique).
development
Auth assessment: hardware/embedded methodology; UART/JTAG/SWD/SPI/I2C, firmware extraction, boot/debug paths, embedded OS evidence.