skills/advanced/intellectual-honesty-enforcement/SKILL.md
# Intellectual Honesty Enforcement - Anti-Handwaving and Commitment Testing ## Core Capability Systematic detection and prevention of intellectual dishonesty patterns including handwaving, hedge-word abuse, commitment avoidance, and the "Problem:" antipattern. Forces concrete positions, specific claims, and honest acknowledgment of limitations. ## Key Functions ### 1. Handwaving Detection and Elimination - Identify vague architectural diagrams masquerading as solutions - Detect and eliminate
npx skillsauth add pauljbernard/headelf skills/advanced/intellectual-honesty-enforcementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic detection and prevention of intellectual dishonesty patterns including handwaving, hedge-word abuse, commitment avoidance, and the "Problem:" antipattern. Forces concrete positions, specific claims, and honest acknowledgment of limitations.
Handwaving Example:
"Step 3: Functional equivalence check between AI solutions"
Detection Criteria:
├── Undefined algorithm for complex computational problem
├── No discussion of computational complexity
├── No failure mode analysis
└── Presented as simple pipeline step
Reality Check Required:
├── Specify exact algorithm for equivalence checking
├── Prove algorithm correctness or admit undecidability
├── Analyze computational complexity and resource requirements
└── Address partial equivalence and edge case handling
Handwaving Example:
"Use verification through diversity framework"
Detection Criteria:
├── High-level boxes and arrows without operational detail
├── Missing cost/performance analysis
├── No integration complexity consideration
└── Assumes away hardest technical problems
Reality Check Required:
├── Specify each component's implementation approach
├── Calculate realistic resource requirements
├── Map integration points and failure modes
└── Acknowledge unsolved subproblems
Hedge Pattern → Commitment Enforcement
├── "Could potentially" → "Will" or "Will not" with conditions
├── "Might be possible" → Specific feasibility analysis
├── "Should generally work" → Success probability with confidence interval
├── "Seems like it would" → Concrete evidence or acknowledge speculation
├── "Probably sufficient" → Quantitative analysis or admit uncertainty
└── "Potentially addresses" → Either addresses or doesn't, with proof
Before: "This approach could potentially provide better security"
After: "This approach provides defense against attacks X, Y, Z but remains vulnerable to attack W. Security improvement quantified as 40% reduction in attack surface based on analysis of common vulnerability patterns."
Before: "The system should generally scale to meet requirements"
After: "System scales to 10,000 concurrent users based on load testing. Bottleneck emerges at database connection pool (1,000 connections). Requires horizontal sharding or read replicas beyond this point."
Technical Assertion → Required Specificity
├── Performance Claims
│ ├── Specific metrics with measurement methodology
│ ├── Test conditions and environmental factors
│ ├── Confidence intervals and statistical significance
│ └── Comparison baseline and measurement variance
├── Security Claims
│ ├── Specific attack models defended against
│ ├── Assumptions about attacker capabilities
│ ├── Formal security definitions if applicable
│ └── Known limitations and out-of-scope threats
├── Cost and Resource Claims
│ ├── SOURCE REQUIRED: Cite specific studies, vendor quotes, or project data
│ ├── UNCERTAINTY ACKNOWLEDGED: If no reliable data, state "cost unknown"
│ ├── METHODOLOGY SPECIFIED: How estimates were derived
│ └── RANGE PROVIDED: Best/worst case scenarios with confidence levels
├── Scalability Claims
│ ├── Specific scaling limits with bottleneck analysis
│ ├── Resource requirements as function of scale
│ ├── Operational complexity growth patterns
│ └── Cost scaling characteristics
└── Reliability Claims
├── Specific failure modes addressed
├── MTBF/MTTR estimates with basis
├── Monitoring and alerting requirements
└── Recovery procedures and human factors
Claim Pattern → Validation Required
├── "$X million cost" → Source: industry report, vendor quote, project experience?
├── "Y% improvement" → Methodology: controlled experiment, benchmark, modeling?
├── "Z-month timeline" → Basis: similar project data, expert estimate, team assessment?
└── "A% accuracy rate" → Evidence: peer-reviewed study, internal testing, vendor claims?
Enforcement Actions:
├── No Source → Replace with "cost unknown" or "timeline uncertain"
├── Weak Source → Acknowledge uncertainty: "estimated based on limited data"
├── Invalid Source → Remove claim entirely
└── Strong Source → Include source citation and methodology
Before: "Multiple AI verification costs $2M setup, 10x development time"
After: "Multiple AI verification costs unknown - no reliable industry data available for enterprise-scale AI verification systems"
Before: "System achieves 99.9% security improvement"
After: "System addresses specific attack vectors X, Y, Z but no quantitative security metrics available"
Before: "Implementation requires 6-month timeline"
After: "Implementation timeline uncertain - depends on team size, complexity, and integration requirements (estimated range: 3-12 months based on similar system complexity)"
Problem: Multiple valid approaches exist
Handwaving Response: "It depends on requirements"
Commitment Requirement:
├── Enumerate specific decision criteria
├── Rank criteria by importance with justification
├── Apply criteria to generate recommendation
├── Acknowledge trade-offs explicitly
└── Specify conditions that would change recommendation
Handwaving: "Choose database based on requirements"
Commitment Enforcement:
├── Decision Criteria (in priority order):
│ 1. Query pattern complexity (60% weight)
│ 2. Scaling requirements (25% weight)
│ 3. Operational expertise (15% weight)
├── Specific Recommendation: PostgreSQL with read replicas
├── Justification:
│ - Query patterns require complex joins (favor relational)
│ - Scale target <100K QPS (within PostgreSQL capability)
│ - Team has 5 years PostgreSQL experience
├── Trade-offs Acknowledged:
│ - Vertical scaling limits at ~500K QPS
│ - No horizontal sharding built-in
│ - ACID guarantees limit some performance optimizations
└── Condition for Reconsideration:
- If scale requirements exceed 300K QPS, reevaluate
- If query patterns become document-heavy, consider MongoDB
- If real-time analytics needed, add OLAP database
Pattern: Every solution immediately followed by "Problem:"
Example:
"Approach 1: Use formal verification
Problem: Requires expertise and time
Approach 2: Use multiple AI systems
Problem: Correlated failures possible
Approach 3: Use human review
Problem: Doesn't scale"
Issues with Pattern:
├── Demonstrates knowledge without solving anything
├── Creates appearance of thoroughness while avoiding commitment
├── Often restates obvious limitations without analysis
└── Prevents forward progress on hard problems
Replace With: Comparative Analysis and Commitment
├── Quantify trade-offs with specific metrics
├── Rank approaches by relevant criteria
├── Make recommendation despite imperfections
├── Specify conditions for approach selection
└── Acknowledge limitations while committing to best option
Enhanced Example:
"Formal verification provides highest assurance (99.9% defect detection) but requires 6-month implementation timeline and $2M investment in tooling/training. Multiple AI systems provide medium assurance (80% defect detection) with 2-month timeline and $200K cost. Human review provides baseline assurance (60% defect detection) with immediate implementation.
Recommendation: Multiple AI systems for current project timeline, with formal verification planned for next product generation. Human review as fallback for AI system failures."
Original Problem: "How do you establish cryptographic trust when the model is non-deterministic?"
Redefinition: "How do we build better verification systems?"
Issues:
├── Changes problem to fit available solutions
├── Avoids confronting fundamental difficulties
├── Optimizes for answerable questions over important ones
└── Gives appearance of progress while making none
Problem Commitment Enforcement:
├── Restate original problem explicitly
├── Map relationship between reframed and original problem
├── Address why reframing is or isn't valid
├── Solve reframed problem AND explain gap to original
└── Acknowledge if original problem remains unsolved
Knowledge State → Honest Expression
├── Known Unknowns
│ ├── "We need empirical data on X before deciding"
│ ├── "Solution depends on unknown parameter Y"
│ └── "Requires research into unsolved problem Z"
├── Unknown Unknowns
│ ├── "This approach may have unforeseen complications"
│ ├── "Implementation may reveal additional constraints"
│ └── "Real-world usage patterns may differ from assumptions"
├── Fundamental Limits
│ ├── "This problem is provably undecidable"
│ ├── "Information-theoretic limitations prevent this approach"
│ └── "Physical constraints make this infeasible"
└── Empirical Uncertainty
├── "Based on limited data, estimate is X ± Y"
├── "Confidence decreases significantly outside tested range"
└── "Extrapolation beyond observed conditions is speculative"
Response Type → Appropriate Content
├── Research Summary
│ ├── State of current knowledge
│ ├── Key open questions
│ ├── Promising research directions
│ └── Timeline for potential breakthroughs
├── Engineering Solution
│ ├── Specific implementation approach
│ ├── Resource requirements and timeline
│ ├── Success metrics and testing strategy
│ └── Deployment and operational plan
├── Gap Analysis
│ ├── What's solvable with current technology
│ ├── What requires research breakthroughs
│ ├── Risk mitigation for unsolved portions
│ └── Go/no-go decision criteria
Original Position: [Technical proposal]
Steel-Man Process:
├── Identify strongest possible objections
├── Research best arguments for opposing view
├── Construct most charitable interpretation of criticism
├── Address steel-man objection with strongest possible response
└── Acknowledge if steel-man objection remains valid
Position: "AI-assisted development increases security through automated verification"
Weak Counter (Straw-man): "AI makes mistakes"
Strong Counter (Steel-man): "AI verification systems can be systematically compromised through adversarial examples that fool both the code generator and verifier, creating a false sense of security that's more dangerous than acknowledged uncertainty. Verification systems sharing cognitive architecture with generation systems cannot provide independent security validation."
Response Requirements:
├── Address specific attack scenario (adversarial examples)
├── Engage with fundamental critique (shared cognitive architecture)
├── Acknowledge validity of security concerns
├── Provide concrete mitigation or acknowledge limitation
└── Revise original position if steel-man argument is sound
Criticism Type → Integration Approach
├── Technical Objection
│ ├── Reproduce criticism scenario
│ ├── Analyze technical validity
│ ├── Modify proposal or acknowledge limitation
│ └── Test modified approach against objection
├── Empirical Challenge
│ ├── Examine evidence provided
│ ├── Identify data gaps or conflicts
│ ├── Design experiments to resolve disagreement
│ └── Update beliefs based on stronger evidence
├── Logical Inconsistency
│ ├── Trace logical error in reasoning
│ ├── Correct logical flaw
│ ├── Rebuild argument with valid logic
│ └── Check for additional inconsistencies
└── Fundamental Impossibility
├── Examine proof of impossibility
├── Verify proof validity
├── Abandon approach if proof is sound
└── Look for ways to relax problem constraints
Experience Type → Validation Required
├── "Production deployment experience"
│ ├── Specific system names, companies, scales
│ ├── Time periods and role responsibilities
│ ├── Quantified outcomes and lessons learned
│ └── If no direct experience: "based on literature/case studies"
├── "War stories from building X"
│ ├── Specific technical details of what broke
│ ├── Timeline and resolution approach
│ ├── Team size and organizational context
│ └── Measurable impact and cost of failure
├── "In production, Y always happens"
│ ├── Specific systems where Y was observed
│ ├── Frequency and conditions of occurrence
│ ├── Mitigation strategies that were/weren't effective
│ └── Statistical data if available, anecdotal if not
└── "Operational reality shows Z"
├── Source: direct experience, team reports, industry surveys
├── Scope: specific technologies, company sizes, use cases
├── Time period: when this observation was valid
└── Limitations: what contexts this may not apply to
Before: "Context manipulation happens constantly in production"
After: "Based on literature review of AI deployment case studies, context manipulation occurs in environments with adversarial users (chatbots, code completion). No direct production experience with AI code generation pipelines available."
Before: "Microservices always create debugging nightmares"
After: "Based on experience with microservices deployments at [specific company/project], debugging complexity increased ~3x due to distributed tracing requirements. Service count: 25-40, team size: 8 engineers, timeline: 2019-2021."
Before: "Kubernetes operational burden is substantial"
After: "Operational burden uncertain - no direct Kubernetes production experience. Industry reports suggest 20-40% resource overhead and dedicated platform team requirements, but specific costs vary by organization size and use cases."
This intellectual honesty enforcement ensures that HeudElf cannot retreat into comfortable vagueness when confronting difficult problems, forcing genuine engagement with technical challenges rather than sophisticated-sounding evasion.
tools
# Security Tools and Frameworks Expertise ## Description Expert-level knowledge of cybersecurity tools, frameworks, and platforms including SIEM systems, vulnerability scanners, penetration testing tools, security orchestration platforms, identity and access management systems, and security automation frameworks with implementation strategies and optimization techniques. ## When to Use - Designing comprehensive security architectures for enterprise systems - Implementing security automation an
tools
# Monitoring and Observability Tools Expertise ## Description Expert-level knowledge of monitoring, observability, and APM (Application Performance Monitoring) tools including Prometheus, Grafana, Jaeger, OpenTelemetry, Elasticsearch, Datadog, New Relic, and cloud-native observability platforms with internal architectures, optimization techniques, and implementation strategies. ## When to Use - Designing comprehensive observability strategies for distributed systems - Implementing monitoring s
tools
# Machine Learning and AI Frameworks Expertise ## Description Expert-level knowledge of machine learning and AI frameworks including TensorFlow, PyTorch, Scikit-learn, Hugging Face, MLflow, Kubeflow, Apache Spark ML, cloud ML platforms, and MLOps tools with optimization techniques, deployment strategies, and production implementation patterns. ## When to Use - Designing and implementing machine learning pipelines and infrastructure - Selecting optimal ML frameworks for specific use cases and r
development
# Message Queue and Streaming Technology Expertise ## Description Expert-level knowledge of message queue systems, event streaming platforms, and asynchronous communication architectures including internal implementations, optimization techniques, failure scenarios, and selection criteria. ## When to Use - Designing high-throughput, low-latency messaging systems - Implementing event-driven architectures and microservices communication - Building real-time data streaming and processing pipeline