skills/context-augmented-code-generation-programming-know/SKILL.md
Enhance code generation with Programming Knowledge Graph (PKG) retrieval, tree pruning, and re-ranking. Uses fine-grained knowledge graph nodes (function blocks, code snippets, documentation paths) to retrieve precisely relevant context, then prunes and re-ranks results to minimize hallucination. Trigger phrases: - "Generate code using knowledge graph context" - "Help me solve this coding problem with relevant examples" - "Retrieve similar code patterns for this function" - "Augment my code generation with external knowledge" - "Find relevant code context and generate a solution" - "Use PKG-style retrieval for this programming task"
npx skillsauth add ndpvt-web/arxiv-claude-skills context-augmented-code-generation-programming-knowInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to apply the Programming Knowledge Graph (PKG) technique from Seddik et al. (2026) to real code generation tasks. Instead of treating external code knowledge as flat documents, Claude structures relevant knowledge into a hierarchical graph of fine-grained nodes (function names, implementation blocks, code fragments, documentation paths), retrieves the most relevant nodes via semantic similarity, prunes irrelevant branches, and re-ranks candidate solutions to suppress hallucination. The result is more accurate, contextually grounded code generation, especially for complex problems where naive LLM generation fails.
Programming Knowledge Graphs (PKGs) decompose code and text into fine-grained graph nodes rather than chunking documents at arbitrary boundaries. For code, an AST-based parser (FunctionAnalyzer) extracts a hierarchy: function name nodes -> implementation nodes -> constituent code block nodes. For text/documentation, JSON-like structures are converted into directed acyclic graphs of (path, value) pairs. Each node gets a semantic embedding (the paper uses VoyageCode2). This granularity means retrieval can return a single relevant code block instead of an entire file, dramatically improving precision.
Tree Pruning addresses the problem of retrieving too much context. After identifying candidate nodes via cosine similarity, the system models retrieved subgraphs as DAGs and iteratively removes branches. Each pruned variant is re-embedded and scored against the query; the variant maximizing similarity is selected. This eliminates tangential code that would confuse the generator.
Re-Ranking with Non-RAG Integration is the hallucination defense. Candidate solutions go through a three-stage pipeline: (1) syntactic validation via AST parsing to eliminate malformed code, (2) runtime execution to remove candidates that crash, and (3) semantic similarity scoring against the original query. Critically, baseline (non-RAG) solutions are included in the candidate pool, so if retrieval introduces noise, the system can fall back to the model's own knowledge. This yielded up to 20% pass@1 gains on HumanEval and 34% improvement over baselines on MBPP while causing minimal regression on problems already solvable without retrieval.
Decompose the problem statement into a clear natural language query and identify what type of code artifact is needed (function, class, module, algorithm). Extract key concepts: data structures involved, algorithms referenced, APIs mentioned.
Build a local knowledge graph from available context. Scan the user's codebase, imported libraries, or provided documentation. For each code file, extract functions and classes using AST parsing. For each function, create three node levels: function signature, full implementation, and individual code blocks (loops, conditionals, expressions). For documentation, extract hierarchical (path, value) pairs.
Embed the query and all graph nodes using a consistent semantic representation. In practice, generate a concise textual description of each node (e.g., "function merge_sort that recursively splits and merges a list") and compute similarity against the user's problem statement. Prioritize nodes whose descriptions share key terms and semantic meaning with the query.
Retrieve the top-k most relevant nodes using cosine similarity between the query embedding and node embeddings. Use three retrieval granularities in parallel:
Prune irrelevant branches from the retrieved subgraph. For each retrieved node, examine its parent and sibling nodes. Remove branches where the combined embedding diverges from the query. Re-score the pruned subgraph to confirm it remains relevant. The goal is a minimal, focused context window.
Format the retrieved context into a structured prompt. Place the most relevant code blocks closest to the generation point (Fill-in-the-Middle style). Include function signatures and docstrings as prefix context, and place the user's specific requirements as a suffix. This sandwich structure guides the model to generate code consistent with both the examples and the requirements.
Generate multiple candidate solutions. Produce at least 2-3 candidates: one using the PKG-augmented context, one without any retrieved context (baseline), and optionally one with a different retrieval granularity. This ensures the re-ranker has diverse candidates to choose from.
Apply the three-stage re-ranking filter:
ast.parse(). Discard any candidate that raises a SyntaxError.Return the top-ranked solution with an explanation of which retrieved context informed it. If the baseline (non-RAG) solution won re-ranking, note that retrieval did not improve on the model's own knowledge for this particular problem.
Iterate if needed. If no candidate passes all three stages, relax the retrieval scope (retrieve more nodes or use a coarser granularity), regenerate candidates, and re-rank again.
Example 1: Generating a function using codebase patterns
User: "Write a function to validate and normalize email addresses, following the patterns used in our utils/validators.py module."
Approach:
utils/validators.py with AST to extract all validator functions as graph nodes (e.g., validate_phone, validate_url, normalize_string).validate_url, the normalization logic from normalize_string, and the error-handling pattern from validate_phone.Output:
import re
def validate_and_normalize_email(email: str) -> str:
"""Validate email format and return normalized version."""
if not isinstance(email, str):
raise TypeError(f"Expected str, got {type(email).__name__}")
email = email.strip().lower()
pattern = re.compile(
r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
)
if not pattern.match(email):
raise ValueError(f"Invalid email address: {email}")
return email
Example 2: Solving an algorithmic problem with retrieved code fragments
User: "Implement a function that finds the longest increasing subsequence in an array. I know there's an O(n log n) approach."
Approach:
[10, 9, 2, 5, 3, 7, 101, 18] -> expected length 4. Select the passing candidate.Output:
import bisect
def length_of_lis(nums: list[int]) -> int:
"""Find length of longest increasing subsequence in O(n log n)."""
tails = []
for num in nums:
pos = bisect.bisect_left(tails, num)
if pos == len(tails):
tails.append(num)
else:
tails[pos] = num
return len(tails)
Example 3: API-dependent code generation with documentation retrieval
User: "Write a function to upload a file to S3 with server-side encryption, using the patterns from our existing storage/ module."
Approach:
storage/ module files. Extract function nodes for existing upload/download operations and documentation nodes for AWS SDK configuration.storage.config.encryption -> "AES256", storage.upload.retry_policy -> exponential_backoff, and function nodes for upload_object() and configure_client().ServerSideEncryption parameter.Output:
def upload_file_encrypted(
bucket: str,
key: str,
file_path: str,
encryption: str = "AES256",
) -> dict:
"""Upload file to S3 with server-side encryption."""
client = get_s3_client() # reuses project's client factory
with open(file_path, "rb") as f:
response = client.put_object(
Bucket=bucket,
Key=key,
Body=f,
ServerSideEncryption=encryption,
)
return response
| Problem | Cause | Resolution | |---|---|---| | Retrieved context is semantically distant from the query | Knowledge graph lacks relevant nodes for this domain | Fall back to baseline (non-RAG) generation; the re-ranker will naturally prefer the non-augmented candidate | | All candidates fail AST validation | Malformed retrieved context corrupting generation | Re-generate without retrieved context; investigate whether the source code nodes contain syntax errors | | Re-ranking selects a wrong candidate | Test cases are insufficient or missing | Add edge-case tests; if no tests exist, rely on semantic scoring and flag the solution as needing manual review | | Retrieval returns too many nodes, exceeding context window | Pruning threshold too permissive | Tighten the similarity threshold; limit to top-3 nodes per granularity level | | Retrieved code uses deprecated APIs or incompatible versions | Source knowledge graph is stale | Validate retrieved code against current dependency versions before including it in the prompt |
Paper: Seddik et al., "Context-Augmented Code Generation Using Programming Knowledge Graphs," arXiv:2601.20810v1 (2026). https://arxiv.org/abs/2601.20810v1
Key takeaway: Fine-grained graph-structured retrieval (block-level AST nodes + pruning) combined with a three-stage re-ranker that includes non-RAG baselines yields up to 20% pass@1 improvement on HumanEval and 34% on MBPP, while avoiding regression on problems the model already solves correctly.
Replication package: https://github.com/iamshahd/ProgrammingKnowledgeGraph
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".