agentic/code/addons/doc-intelligence/skills/source-unifier/SKILL.md
Merge multiple documentation sources (docs, GitHub, PDF) with conflict detection. Use when combining docs + code for complete skill coverage.
npx skillsauth add jmagly/aiwg source-unifierInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Single responsibility: Intelligently merge documentation from multiple sources (websites, GitHub repos, PDFs) while detecting and transparently reporting conflicts between documented and implemented behavior. (BP-4)
Before executing, VERIFY:
DO NOT merge without inspecting each source first.
ASK USER instead of guessing when:
NEVER silently resolve conflicts. Always report discrepancies.
| Context Type | Included | Excluded | |--------------|----------|----------| | RELEVANT | All specified sources, merge config | Unrelated documentation | | PERIPHERAL | Version history for context | Other projects | | DISTRACTOR | Previous merge attempts | Unrelated codebases |
| Type | Severity | Description | Example |
|------|----------|-------------|---------|
| Missing in code | HIGH | Documented but not implemented | API endpoint in docs, not in code |
| Missing in docs | MEDIUM | Implemented but not documented | Hidden feature in code |
| Signature mismatch | MEDIUM | Different parameters/types | func(a, b) vs func(a, b, c=None) |
| Description mismatch | LOW | Different explanations | Wording differences |
# Test documentation URL
curl -I https://docs.example.com/
# Test GitHub repo
gh repo view owner/repo --json name,description
# Test PDF file
file manual.pdf && pdfinfo manual.pdf
{
"name": "myframework",
"description": "Complete framework knowledge from docs + code",
"merge_mode": "rule-based",
"conflict_resolution": {
"missing_in_code": "warn",
"missing_in_docs": "include",
"signature_mismatch": "show_both",
"description_mismatch": "prefer_docs"
},
"sources": [
{
"type": "documentation",
"base_url": "https://docs.example.com/",
"extract_api": true,
"max_pages": 200
},
{
"type": "github",
"repo": "owner/myframework",
"include_code": true,
"code_analysis_depth": "surface",
"max_issues": 100
},
{
"type": "pdf",
"path": "docs/manual.pdf",
"extract_tables": true
}
]
}
Option A: With skill-seekers
skill-seekers unified --config unified-config.json
Option B: Manual merge workflow
The unifier generates a conflict report:
# Conflict Report: myframework
## Summary
- Total APIs analyzed: 245
- Conflicts detected: 18
- Missing in code: 3 (HIGH)
- Missing in docs: 8 (MEDIUM)
- Signature mismatches: 5 (MEDIUM)
- Description mismatches: 2 (LOW)
## HIGH Severity Conflicts
### `deprecated_function()`
- **Status**: Documented but not found in code
- **Documentation**: "Use this function to..."
- **Code**: NOT FOUND
- **Recommendation**: Remove from docs or implement
## MEDIUM Severity Conflicts
### `process_data(input: str)`
- **Status**: Signature mismatch
- **Documentation**: `process_data(input: str)`
- **Code**: `process_data(input: str, validate: bool = True)`
- **Recommendation**: Update documentation to include `validate` parameter
# Check merged skill structure
ls -la output/myframework/
# Verify conflict annotations
grep -r "⚠️\|Conflict\|WARNING" output/myframework/references/
# Count conflict markers
grep -c "Conflict" output/myframework/references/*.md
On error:
Source unavailable → Skip source, note in reportParse error → Check source format, retry with different parserMemory error → Process sources sequentiallyConflict overflow → Increase conflict threshold or filter by severityState saved to: .aiwg/working/checkpoints/source-unifier/
checkpoints/source-unifier/
├── source_1_docs.json # Processed docs
├── source_2_github.json # Processed GitHub
├── source_3_pdf.json # Processed PDF
├── conflicts.json # Detected conflicts
└── merge_progress.json # Current merge state
Resume: skill-seekers unified --config config.json --resume
output/myframework/
├── SKILL.md # Main skill with conflict summary
├── references/
│ ├── index.md # Unified index
│ ├── api_reference.md # Merged API docs (with conflict markers)
│ ├── guides.md # Merged guides
│ └── conflicts.md # Detailed conflict report
├── sources/
│ ├── documentation.md # Original docs content
│ ├── github.md # GitHub-extracted content
│ └── pdf.md # PDF-extracted content
└── metadata/
├── sources.json # Source metadata
└── conflict_summary.json # Machine-readable conflicts
Merged content includes inline conflict markers:
#### `process_data(input: str, validate: bool = True)`
⚠️ **Conflict**: Documentation signature differs from implementation
**Documentation says:**
```python
def process_data(input: str) -> dict:
"""Process input data and return results."""
Code implementation:
def process_data(input: str, validate: bool = True) -> dict:
"""Process input data with optional validation."""
Resolution: Documentation should be updated to include the validate parameter added in v2.3.
## Merge Modes
| Mode | Description | Use Case |
|------|-------------|----------|
| `rule-based` | Apply predefined rules for conflict resolution | Fast, deterministic |
| `ai-enhanced` | Use AI to intelligently merge conflicting content | Better quality, slower |
| `manual` | Generate conflicts only, user resolves | Full control |
## Troubleshooting
| Issue | Diagnosis | Solution |
|-------|-----------|----------|
| Too many conflicts | Sources very different | Filter by severity, merge incrementally |
| False positives | Parser differences | Normalize API extraction |
| Missing content | Source incomplete | Add supplementary source |
| Merge too slow | Large sources | Use parallel processing |
## References
- Skill Seekers Unified Scraping: https://github.com/jmagly/Skill_Seekers/blob/main/docs/UNIFIED_SCRAPING.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-6 model consortium parallel)
- REF-002: LLM Failure Modes (Archetype 1-4 mitigations)
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.