skills/fullstack-agent-enhancing-agentic-fullstack/SKILL.md
Build production-grade full-stack web applications using a three-agent pipeline (Planning, Backend, Frontend) with development-oriented testing and structured debugging. Triggers: 'build a full-stack app', 'create a web app with backend and database', 'full-stack website with API and database', 'build me a CRUD app', 'create a web application with user authentication and data storage', 'scaffold a complete web project with frontend and backend'
npx skillsauth add ndpvt-web/arxiv-claude-skills fullstack-agent-enhancing-agentic-fullstackInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to build complete, production-level full-stack web applications by following a structured three-agent pipeline derived from the FullStack-Agent framework. Instead of generating superficial frontend-only pages that mask the absence of real data processing, this approach enforces genuine backend logic, database persistence, and validated data flow by decomposing development into Planning, Backend Coding, and Frontend Coding phases — each with integrated development-oriented testing that catches bugs through automated API probing and GUI-level interaction validation.
The FullStack-Agent framework addresses the core problem that most AI-generated web apps are frontend-only illusions — they render mock data with visual effects but lack genuine server-side logic and database persistence. The paper's key insight is to decompose full-stack development into three sequential agent phases with specialized tooling: a Planning Agent that produces a structured architecture plan (page layouts, components, API endpoints, database schema, and data flow), a Backend Coding Agent that implements all server-side logic first, and a Frontend Coding Agent that builds the UI against the actual running backend APIs.
The critical innovation is development-oriented testing embedded directly in the coding loop. Rather than writing tests after the fact, each coding agent has access to specialized debugging tools: a Backend Test Tool (analogous to Postman) that sends HTTP requests to API endpoints and returns both the response and backend console output, and a Frontend Test Tool that launches the full application, runs a GUI-agent interaction process, and monitors both terminal and browser console outputs. These tools allow the agent to iteratively detect and fix bugs during development, not after — reducing debugging iterations significantly (the paper reports average iterations dropping from 115.5 to 74.9 with the backend debugging tool alone).
The three-layer testing approach (FullStack-Bench) validates correctness at every level: frontend tests check that UI interactions produce correct visual results and trigger proper backend calls; backend tests verify that API endpoints return correct responses for given inputs; database tests extract table schemas and row snapshots to confirm data was actually persisted correctly. This ensures no layer can "fake" functionality.
Analyze the user request and produce a structured development plan. Write a JSON or markdown plan that specifies: (a) page layouts and components for each view, (b) backend API endpoints with methods, URLs, request/response schemas, (c) database tables with column names, types, and relationships, (d) data flow between frontend actions, API calls, and database operations. This plan is the contract all subsequent work follows.
Select the technology stack and initialize the project. Based on the plan, choose appropriate frameworks (e.g., React/Vue + Express/FastAPI + SQLite/PostgreSQL). Create the project directory structure with separate frontend and backend directories. Install dependencies and configure the development environment.
Implement the database layer first. Create schema definitions, migration files, and seed data. Verify the schema by running migrations and inspecting the resulting tables. Confirm that column names, types, and constraints match the plan.
Build all backend API endpoints. Implement each endpoint from the plan: route handlers, request validation, business logic, and database queries. Follow RESTful conventions. Include proper error responses with meaningful status codes and messages.
Test each backend endpoint immediately after implementation. For every endpoint, run a concrete HTTP request using curl or a test script. Verify: (a) the response status code and body match expectations, (b) the backend console shows no errors, (c) the database state changed correctly (query the DB and inspect rows). Fix any issues before moving to the next endpoint.
Generate an API summary document. List every endpoint with its URL, method, expected request body, and response format. This becomes the contract the frontend coding phase uses to integrate with the backend.
Build the frontend against the live backend. Implement each page/component from the plan, wiring UI actions to actual API calls using the API summary. Use the real backend URL — never hardcode mock data. Handle loading states, errors, and edge cases in the UI.
Test the full application through GUI-level interaction. Start both frontend and backend servers. Walk through every user flow described in the plan: click buttons, fill forms, submit data, navigate between pages. After each interaction, verify: (a) the UI updated correctly, (b) the backend received the correct request (check server logs), (c) the database contains the expected data (query and inspect).
Validate database persistence end-to-end. After completing all user flows, extract the database contents (table schemas + sample rows) and verify they match expected state. This catches silent failures where the UI appears to work but data is lost or corrupted.
Fix any issues found during testing by localizing the bug layer. When a test fails, determine whether the fault is in the frontend (wrong API call, incorrect payload), the backend (wrong logic, missing validation), or the database (wrong schema, missing migration). Fix at the correct layer and re-run the relevant tests.
Example 1: Build a Task Manager with User Authentication
User: "Build me a full-stack task manager app where users can sign up, log in, create tasks with due dates, mark them complete, and filter by status."
Approach:
curl -X POST localhost:3001/auth/register -H 'Content-Type: application/json' -d '{"email":"[email protected]","password":"pass123"}' — verify 201 response and user row in DB. Test login, then use the JWT to create/read/update/delete tasks.SELECT * FROM tasks — confirm 3 rows, one with status="completed".Output structure:
task-manager/
backend/
server.js # Express app with auth middleware
db.js # SQLite connection and migrations
routes/auth.js # Register and login endpoints
routes/tasks.js # CRUD task endpoints
frontend/
src/
App.jsx # Router with auth context
pages/Login.jsx
pages/Register.jsx
pages/Dashboard.jsx
components/TaskList.jsx
components/TaskForm.jsx
api/client.js # Fetch wrapper with JWT
Example 2: Build an E-Commerce Product Catalog with Cart
User: "Create a simple e-commerce site with a product catalog, shopping cart, and checkout that saves orders to a database."
Approach:
Example 3: Debugging a Full-Stack Bug
User: "My app shows data on the frontend but when I refresh it's gone. The save button seems to work but data doesn't persist."
Approach:
| Problem | Diagnosis | Fix | |---|---|---| | API returns 500 but no clear error | Backend lacks error middleware or swallows exceptions | Add global error handler that logs full stack traces; check that async route handlers have try/catch or express-async-errors | | Frontend shows stale data after mutation | Cache invalidation issue or missing refetch | After POST/PATCH/DELETE, explicitly refetch the relevant GET endpoint or invalidate the query cache | | Database table missing columns | Migration not run or schema out of sync | Re-run migrations; compare DB schema against plan; add the missing ALTER TABLE | | CORS errors blocking API calls | Backend missing CORS headers for frontend origin | Configure CORS middleware with the correct frontend URL and allowed methods | | Data saves but relationships are broken | Foreign key constraints missing or wrong IDs | Add proper FOREIGN KEY constraints; verify IDs are passed correctly in API requests | | App works in dev but fails in production | Hardcoded localhost URLs or missing env vars | Use environment variables for all URLs and secrets; verify .env files exist in production |
Paper: FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation — Lu et al., 2026. Key insight: decompose full-stack development into Planning/Backend/Frontend phases with embedded debugging tools (Postman-like API tester + GUI-agent browser tester) that validate correctness at every layer during development, not after.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".