skills/davinci-dev-agent-native-mid-training-software/SKILL.md
Apply daVinci-Dev's agent-native workflow to software engineering tasks: navigate repos, localize bugs, plan edits, apply structured patches, and verify with tests. Use when asked to 'fix this issue in the repo', 'resolve this bug across files', 'apply agentic SWE workflow', 'navigate and patch this codebase', 'use agent-native approach to solve this', or 'debug and fix with test verification'.
npx skillsauth add ndpvt-web/arxiv-claude-skills davinci-dev-agent-native-mid-training-softwareInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to tackle complex, multi-file software engineering tasks using the agent-native workflow from daVinci-Dev. Instead of generating a single code block, Claude systematically navigates the repository to build context, localizes the relevant code, plans precise edits using search-and-replace operations, and verifies correctness through test execution — mirroring the think-act-observe loop that produces state-of-the-art results on SWE-Bench Verified (58.5% resolution rate).
daVinci-Dev identifies that effective software engineering agents need two complementary types of experience. Contextually-native trajectories preserve the full information flow an agent encounters: the issue description, repository structure, relevant file contents, and the reasoning chain that leads to a specific edit. Environmentally-native trajectories go further — they capture actual tool invocations (bash commands, file views, string replacements) and real execution feedback (test output, error messages) from running against live repositories.
The critical insight is distribution fidelity: training data must match the actual experience of working in a codebase. Static code-diff pairs miss the navigation, search, dead-ends, and iterative refinement that characterize real development. The daVinci-Dev workflow structures each task as a loop of think (reason about what to do next), act (invoke a tool — search, view, edit, or run tests), and observe (read the tool output and update your plan). This loop continues until tests pass or the problem is resolved.
The action space consists of three core tools: bash (run arbitrary shell commands for searching, grepping, running tests), str_replace_editor (view files, create files, perform exact string replacements, insert lines), and submit (finalize when done). Edits are expressed as precise search-and-replace operations rather than full-file rewrites, minimizing unintended changes and making each modification auditable.
Parse the issue: Read the bug report, feature request, or failing test. Extract the expected behavior, actual behavior, reproduction steps, and any stack traces or error messages. Identify key symbols (function names, class names, error strings) to search for.
Explore repository structure: Run find or ls on the project root to understand the directory layout. Identify the language, framework, test infrastructure, and entry points. Read the README or setup files if the project is unfamiliar.
Localize with targeted search: Use grep -rn or rg with the key symbols from step 1 to find candidate files. Prioritize: error messages first, then function/class names, then module paths mentioned in stack traces. Read each candidate file to confirm relevance.
Build deep context on relevant files: For each file you plan to modify, read the full file (or the relevant section). Understand the surrounding logic, data flow, and any callers/callees. Check imports and type definitions that constrain the change.
Formulate an edit plan: Before touching any code, articulate the precise changes needed. State which file, which lines, what the current code does, and what the replacement should do. If multiple files need changes, determine the correct order (e.g., update the interface before the implementation).
Apply edits as exact string replacements: Use the Edit tool with old_string / new_string pairs. Each replacement should be minimal and surgical — change only what is necessary. Never rewrite entire files when a targeted replacement suffices.
Run the relevant tests: Execute the test suite (or the specific test file) that covers the changed code. Read the output carefully. If tests fail, return to step 3 with the new error information and iterate.
Verify no regressions: Run a broader test sweep if available (e.g., the full test suite or a related test directory). Check that your changes don't break unrelated functionality.
Review the final diff: Examine all changes together. Confirm each edit is necessary, no debug code remains, and the fix is minimal. Remove any exploratory changes that aren't part of the solution.
Summarize the resolution: Explain what caused the issue, what was changed, and why each change is correct. Reference the specific files and line numbers modified.
Example 1: Fixing a KeyError in a Django view
User: "I'm getting a KeyError: 'user_id' in our API endpoint at /api/profile/.
The traceback points to views.py line 42."
Approach:
1. Read the traceback to identify views.py and the specific line
2. Search for 'user_id' usage: grep -rn "user_id" src/api/
3. Read views.py to understand the profile endpoint logic
4. Discover that request.data['user_id'] assumes the key exists,
but the frontend sends 'userId' (camelCase) after a recent refactor
5. Read the serializer and frontend code to confirm the field name change
6. Plan: update views.py to use request.data.get('userId') or update
the serializer to accept both formats
Edit applied (views.py):
old_string: "user_id = request.data['user_id']"
new_string: "user_id = request.data['userId']"
7. Run: python manage.py test api.tests.test_profile -v2
Result: 4 tests pass, 0 failures
8. Run: python manage.py test api/ -v2
Result: 47 tests pass, 0 failures
Example 2: Resolving a race condition in async code
User: "Our background job processor sometimes processes the same job twice.
The relevant code is in workers/job_processor.py."
Approach:
1. Read workers/job_processor.py — find the job claim logic
2. Search for locking mechanisms: grep -rn "lock\|atomic\|transaction" workers/
3. Discover that claim_job() reads then updates the job status in two
separate DB calls without a transaction
4. Read the database model to understand available locking primitives
5. Check if the ORM supports select_for_update or similar
Edit plan:
- Wrap the read-and-update in an atomic transaction with row-level locking
- File: workers/job_processor.py
Edit applied:
old_string: |
job = Job.objects.filter(status='pending').first()
if job:
job.status = 'processing'
job.save()
new_string: |
from django.db import transaction
with transaction.atomic():
job = Job.objects.select_for_update(
skip_locked=True
).filter(status='pending').first()
if job:
job.status = 'processing'
job.save()
6. Run: python -m pytest workers/tests/test_job_processor.py -v
Result: all 12 tests pass including the new concurrency test
7. Run the full worker test suite to check for regressions
Example 3: Adding a missing feature across multiple files
User: "Add CSV export to our report endpoint. We already have JSON export
in reports/views.py. The tests are in reports/tests/test_export.py."
Approach:
1. Read reports/views.py to understand the existing JSON export logic
2. Read reports/tests/test_export.py to see existing test patterns
3. Search for any existing CSV utilities: grep -rn "csv\|CSV" src/
4. Read reports/serializers.py to understand the data format
Edit plan (3 files):
a) reports/views.py — add CSVExportView class mirroring JSONExportView
b) reports/urls.py — add URL route for /reports/export/csv/
c) reports/tests/test_export.py — add test_csv_export test case
Apply edits sequentially:
- First: add the CSV view class to views.py
- Second: add the URL pattern to urls.py
- Third: add the test case
Run: python -m pytest reports/tests/test_export.py -v
Result: 6 tests pass (3 existing JSON + 3 new CSV)
Run: python -m pytest reports/ -v
Result: all 24 tests pass
str_replace_editor action space and keeps changes minimal and auditable.--include='*.py'), path restrictions, or more specific patterns. Use function signatures rather than common variable names.old_string to include more surrounding context (additional lines before/after) until the match is unique.Paper: daVinci-Dev: Agent-native Mid-training for Software Engineering — Look for Section 3 on agent-native data synthesis (contextually-native vs. environmentally-native trajectories) and Section 4 on the mid-training methodology that achieves 58.5% on SWE-Bench Verified with a 72B model.
Code & Data: github.com/GAIR-NLP/daVinci-Dev — The pipeline directory contains the data construction code; env_traj_utils shows the trajectory format with bash, str_replace_editor, and submit actions.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".