
Multi-model validation council — auto-validate plans, architecture changes, and PRs via validate-plan/review before executing
Mandatory code reviews via /code-review before commits and deploys
# Visual Validation — Autonomous Screenshot Verification ## Philosophy Every UI change should be visually verified before it ships. Peekaboo captures pixel-accurate screenshots. The system compares before/after and flags visual regressions. No manual "looks good to me" — the machine verifies what the machine built. ## Autonomous Flow ``` static/* files modified (detected by auto-review-hook or E2E testkit) ↓ peekaboo image --mode screen → ~/.maggy/visual-verify/after-{ts}.png ↓ Compa
# Autonomous Testing Agent ## Overview An AI-driven testing agent that auto-discovers, generates, executes, evaluates, and fixes tests for any project type. Inspired by the edubites autonomous test runner pattern, generalized for Claude Bootstrap + Maggy. ## Pipeline ``` Source Scan → Discover Gaps → Generate Tests → Execute → Evaluate → Report → Fix Loop ``` ## Phase 1: Discover — What Needs Testing? ``` Auto-detect project type: Python → scan for *.py files, extract public functions
# Build in Public — Best Practices ## Philosophy Build in public isn't marketing. It's letting people watch you work. The best posts feel like you're narrating your thought process to a friend who's also a senior engineer. No hype sludge. No "I'm excited to announce." Just: here's what I built, here's why it matters, here's what I learned. ## What to Share (and What Not To) **Share:** - Technical decisions and the reasoning behind them ("Chose SQLite over Postgres because...") - Architecture
# Model Routing System ## How Routing Decisions Are Made Every user prompt goes through a 9-tier classification pipeline before any AI model processes it. The system answers three questions: 1. **Which model should handle this?** — 9-tier cost/complexity classification 2. **Is the classifier itself working?** — Cascading fallback (qwen3 → kimi → deepseek → cache) 3. **Can we verify the result?** — Tool-level fallback + auto-evaluation ### The Pipeline ``` User types prompt ↓ UserPromptS
# External Model Delegation Pattern A `UserPromptSubmit` hook classifies every user prompt into one of six cost/performance tiers. The hook injects `additionalContext` instructing Claude to run a specific delegation script and return the output. ## Tier routing table | Tier | Delegation command | Cost | |------|-------------------|------| | QWEN | `qwen3 "prompt"` | $0 (local Ollama) | | DEEPSEEK_FLASH | `deepseek --flash "prompt"` | $0.14 / $0.28 per M tokens | | DEEPSEEK_PRO | `deepseek --p
Task-scoped memory lifecycle — typed MnemoGraph prevents lossy context compaction by treating facts/decisions/code-refs/handoffs as distinct node types with per-type eviction policies
Maggy is a local AI engineering command center. AI-prioritized inbox across issue trackers (GitHub Issues/Asana), one-click TDD execute with iCPG context enrichment, daily competitor intelligence briefing.
Multi-agent orchestration with container-isolated workspaces — each agent session runs in its own Docker container with independent git branches
Claude Code Agent Teams - default team-based development with strict TDD pipeline enforcement
Dynamic multi-repo and monorepo awareness for Claude Code. Analyze workspace topology, track API contracts, and maintain cross-repo context.
Cross-agent task routing — Codex auto-review, Kimi delegation by complexity score (iCPG + Claude reasoning), iCPG + Mnemos mandatory for all agents
Intent-Augmented Code Property Graph — tracks WHY code exists via ReasonNodes with formal contracts, 6-dimension drift detection, and 3 canonical pre-task queries for autonomous development
React web development with hooks, React Query, Zustand
AST-based code graph for fast symbol lookup, dependency analysis, and blast radius via codebase-memory-mcp MCP server
Schema awareness - read before coding, type generation, prevent column errors
Firebase Firestore, Auth, Storage, real-time listeners, security rules
Google Gemini CLI code review with Gemini 2.5 Pro, 1M token context, CI/CD integration
Klaviyo email/SMS marketing - profiles, events, flows, segmentation
E2E testing with Playwright - Page Objects, cross-browser, CI/CD
Python development with ruff, mypy, pytest - TDD and type safety
React Native mobile patterns, platform-specific code
OWASP security patterns, secrets management, security testing
Shopify app development - Remix, Admin API, checkout extensions
Express/Hono with Supabase and Drizzle ORM
Multi-person projects - shared state, todo claiming, handoffs
Web UI - glassmorphism, Tailwind, dark mode, accessibility
SEO and AI discovery (GEO) - schema, ChatGPT/Perplexity optimization
Android Kotlin development with Coroutines, Jetpack Compose, Hilt, and MockK testing
Prevent semantic code duplication with capability index and check-before-write
Deep code property graph analysis with Joern CPG (AST+CFG+PDG) and CodeQL for control flow, data flow, taint analysis, and security auditing
User experience flows - journey mapping, UX validation, error recovery
Microsoft Teams bots and AI agents - Claude/OpenAI, Adaptive Cards, Graph API
gh, vercel, supabase, render CLI and deployment platform setup
Reddit Ads API - campaigns, targeting, conversions, agentic optimization
Stripe Checkout, subscriptions, webhooks, customer portal
Medusa headless commerce - modules, workflows, API routes, admin UI
AI Engine Optimization - semantic triples, page templates, content clusters for AI citations
Android Java development with MVVM, ViewBinding, and Espresso testing
Analyze existing repositories, maintain structure, setup guardrails and best practices
TDD iteration loops using Claude Code Stop hooks - runs tests after each response, feeds failures back automatically
Node.js backend patterns with Express/Fastify, repositories
Context preservation, tiered summarization, resumability
Next.js with Supabase and Drizzle ORM
FastAPI with Supabase and SQLAlchemy/SQLModel
Mobile UI patterns - React Native, iOS/Android, touch targets
WooCommerce REST API - products, orders, customers, webhooks
Reddit API with PRAW (Python) and Snoowrap (Node.js)
Atomic commits, PR size limits, commit thresholds, stacked PRs
Flutter development with Riverpod state management, Freezed, go_router, and mocktail testing
Build AI agents with Pydantic AI (Python) and Claude SDK (Node.js)
TypeScript strict mode with eslint and jest
Technical SEO - robots.txt, sitemap, meta tags, Core Web Vitals
OpenAI Codex CLI code review with GPT-5.2-Codex, CI/CD integration
Core Supabase CLI, migrations, RLS, Edge Functions
AWS DynamoDB single-table design, GSI patterns, SDK v3 TypeScript/Python
Centralized API key management from Access.txt
AI-first application patterns, LLM testing, prompt management
Progressive Web Apps - service workers, caching strategies, offline, Workbox
Cloudflare D1 SQLite database with Workers, Drizzle ORM, migrations
PostHog analytics, event tracking, feature flags, dashboards
Latest AI models reference - Claude, OpenAI, Gemini, Eleven Labs, Replicate
AWS Aurora Serverless v2, RDS Proxy, Data API, connection pooling
Universal coding patterns, constraints, TDD workflow, atomic todos
Create Jira/Asana/Linear tickets optimized for Claude Code execution - AI-native ticket writing
Visual testing - catch invisible buttons, broken layouts, contrast
Azure Cosmos DB partition keys, consistency levels, change feed, SDK patterns