.claude/skills/build-kg/SKILL.md
Build a knowledge graph from any topic. Generates an ontology, discovers sources, crawls, chunks, loads, and parses into Apache AGE (PostgreSQL).
npx skillsauth add agtm1199/build-kg build-kgInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build a knowledge graph for: $ARGUMENTS
Activate the virtual environment before every Python command: . venv/bin/activate && <command>.
kubernetes_networking.mkdir -p kg_builds/$GRAPH_NAME
AGE_GRAPH_NAME=$GRAPH_NAME in .env.python -m build_kg.setup_graph
Design a domain ontology for $ARGUMENTS. This is the most important phase.
Identify 3-6 node types for the core entities. For each:
label: PascalCase (e.g. Component, Algorithm)description: what this node representsproperties: key-value pairs with types (string, integer, float, boolean, json)Identify 3-8 edge types. For each:
label: UPPER_SNAKE_CASE (e.g. USES, DEPENDS_ON)source and target: node labelsdescription: what the relationship meansChoose a root_node: the primary node type that maps 1:1 to source fragments.
Write the json_schema: the exact JSON format the LLM should output.
Save as kg_builds/$GRAPH_NAME/ontology.yaml:
description: "<Topic> knowledge graph ontology"
nodes:
- label: "NodeType1"
description: "..."
properties:
name: "string"
category: "string"
- label: "NodeType2"
description: "..."
properties:
name: "string"
edges:
- label: "RELATIONSHIP_NAME"
source: "NodeType1"
target: "NodeType2"
description: "..."
root_node: "NodeType1"
json_schema: |
{
"entities": [
{"_label": "NodeType1|NodeType2", "name": "...", "category": "..."}
],
"relationships": [
{"_label": "RELATIONSHIP_NAME", "_from_index": 0, "_to_index": 1}
]
}
python -m build_kg.setup_graph --ontology kg_builds/$GRAPH_NAME/ontology.yaml
Find 5-15 authoritative sources about $ARGUMENTS using web search.
Search with multiple queries:
"$ARGUMENTS" official documentation"$ARGUMENTS" comprehensive guide"$ARGUMENTS" reference manual"$ARGUMENTS" tutorial overview"$ARGUMENTS" specificationEvaluate each result: Is it authoritative? Does it have substantial text? Is it crawlable?
Organize into priority tiers:
Create kg_builds/$GRAPH_NAME/manifest.json:
{
"topic": "$ARGUMENTS",
"graph_name": "$GRAPH_NAME",
"sources": [
{
"source_name": "descriptive_short_name",
"url": "https://...",
"title": "Page Title",
"authority": "Organization Name",
"jurisdiction": "",
"doc_type": "documentation",
"priority": "P1",
"depth": 2,
"max_pages": 50,
"delay": 1500
}
],
"defaults": {
"jurisdiction": "",
"authority": "",
"doc_type": "documentation"
}
}
For each source in the manifest:
build-kg-crawl --url "$URL" --output kg_builds/$GRAPH_NAME/crawled/$SOURCE_NAME --depth $DEPTH --pages $MAX_PAGES --delay $DELAY --format markdown
If a crawl fails, note it and continue. Do not retry more than once.
build-kg-chunk kg_builds/$GRAPH_NAME/crawled kg_builds/$GRAPH_NAME/chunks --strategy by_title --max-chars 1000
build-kg-load kg_builds/$GRAPH_NAME/chunks --manifest kg_builds/$GRAPH_NAME/manifest.json
Small datasets (< 500 fragments) — sync:
build-kg-parse --ontology kg_builds/$GRAPH_NAME/ontology.yaml
Large datasets (500+ fragments) — batch (50% cheaper):
build-kg-parse-batch prepare --ontology kg_builds/$GRAPH_NAME/ontology.yaml --output kg_builds/$GRAPH_NAME/batch_requests.jsonl
build-kg-parse-batch submit kg_builds/$GRAPH_NAME/batch_requests.jsonl
build-kg-parse-batch status $BATCH_ID --watch
build-kg-parse-batch process $BATCH_ID --ontology kg_builds/$GRAPH_NAME/ontology.yaml
SELECT * FROM cypher('$GRAPH_NAME', $$ MATCH (n) RETURN label(n) AS type, count(*) AS total $$) AS (type agtype, total agtype);
SELECT * FROM cypher('$GRAPH_NAME', $$ MATCH ()-[r]->() RETURN type(r) AS rel, count(*) AS total $$) AS (rel agtype, total agtype);
SELECT * FROM cypher('$GRAPH_NAME', $$ MATCH (a)-[r]->(b) RETURN a, type(r), b LIMIT 10 $$) AS (a agtype, rel agtype, b agtype);
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.