static-analysis/skills/sarif-parsing/SKILL.md
Parse, analyze, and process SARIF (Static Analysis Results Interchange Format) files. Use when reading security scan results, aggregating findings from multiple tools, deduplicating alerts, extracting specific vulnerabilities, or integrating SARIF data into CI/CD pipelines.
npx skillsauth add lidge-jun/cli-jaw-skills sarif-parsingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Parse, analyze, and process SARIF files from static analysis tools. Out of scope: running scans (use CodeQL/Semgrep skills), writing rules, analyzing source code directly.
SARIF 2.1.0 is the current OASIS standard:
sarifLog
├── version: "2.1.0"
└── runs[]
├── tool
│ ├── driver
│ │ ├── name (required)
│ │ ├── version
│ │ └── rules[]
│ └── extensions[]
├── results[]
│ ├── ruleId
│ ├── level (error/warning/note)
│ ├── message.text
│ ├── locations[]
│ │ └── physicalLocation
│ │ ├── artifactLocation.uri
│ │ └── region (startLine, startColumn, etc.)
│ ├── fingerprints{}
│ └── partialFingerprints{}
└── artifacts[]
Fingerprints hash content (code snippet, rule ID, relative location) to create stable identifiers regardless of environment. Without them, baseline comparison, regression detection, and suppression all fail — because tools report different paths across environments.
| Use Case | Tool | Install |
|----------|------|---------|
| Quick CLI queries | jq | brew install jq / apt install jq |
| Python (simple) | pysarif | pip install pysarif |
| Python (advanced) | sarif-tools | pip install sarif-tools |
| .NET | SARIF SDK | NuGet |
| JavaScript | sarif-js | npm |
| Go | garif | go get github.com/chavacava/garif |
| Validation | SARIF Validator | sarifweb.azurewebsites.net |
# Count total findings
jq '[.runs[].results[]] | length' results.sarif
# List all triggered rule IDs
jq '[.runs[].results[].ruleId] | unique' results.sarif
# Extract errors only
jq '.runs[].results[] | select(.level == "error")' results.sarif
# Findings with file locations
jq '.runs[].results[] | {
rule: .ruleId,
message: .message.text,
file: .locations[0].physicalLocation.artifactLocation.uri,
line: .locations[0].physicalLocation.region.startLine
}' results.sarif
# Count per rule at error level
jq '[.runs[].results[] | select(.level == "error")] | group_by(.ruleId) | map({rule: .[0].ruleId, count: length})' results.sarif
# Filter by specific file
jq --arg file "src/auth.py" '.runs[].results[] | select(.locations[].physicalLocation.artifactLocation.uri | contains($file))' results.sarif
from pysarif import load_from_file, save_to_file
sarif = load_from_file("results.sarif")
for run in sarif.runs:
for result in run.results:
print(f" [{result.level}] {result.rule_id}: {result.message.text}")
if result.locations:
loc = result.locations[0].physical_location
if loc and loc.artifact_location:
print(f" File: {loc.artifact_location.uri}:{loc.region.start_line if loc.region else '?'}")
save_to_file(sarif, "modified.sarif")
from sarif import loader
sarif_data = loader.load_sarif_file("results.sarif")
sarif_set = loader.load_sarif_files(["tool1.sarif", "tool2.sarif"]) # multi-file
report = sarif_data.get_report()
errors = report.get_issue_type_histogram_for_severity("error")
sarif-tools CLI:
sarif summary results.sarif
sarif ls --level error results.sarif
sarif diff baseline.sarif current.sarif # find new/fixed issues
sarif csv results.sarif > results.csv
sarif html results.sarif > report.html
import json
def aggregate_sarif_files(sarif_paths: list[str]) -> dict:
aggregated = {
"version": "2.1.0",
"$schema": "https://json.schemastore.org/sarif-2.1.0.json",
"runs": []
}
for path in sarif_paths:
with open(path) as f:
aggregated["runs"].extend(json.load(f).get("runs", []))
return aggregated
def deduplicate_results(sarif: dict) -> dict:
"""Remove duplicates based on fingerprints, falling back to rule+location key."""
seen = set()
for run in sarif["runs"]:
unique = []
for result in run.get("results", []):
fp = None
if result.get("partialFingerprints"):
fp = tuple(sorted(result["partialFingerprints"].items()))
elif result.get("fingerprints"):
fp = tuple(sorted(result["fingerprints"].items()))
else:
loc = result.get("locations", [{}])[0]
phys = loc.get("physicalLocation", {})
fp = (result.get("ruleId"),
phys.get("artifactLocation", {}).get("uri"),
phys.get("region", {}).get("startLine"))
if fp not in seen:
seen.add(fp)
unique.append(result)
run["results"] = unique
return sarif
Different tools report paths differently (absolute, relative, URI-encoded):
from urllib.parse import unquote
from pathlib import Path
def normalize_path(uri: str, base_path: str = "") -> str:
if uri.startswith("file://"):
uri = uri[7:]
uri = unquote(uri)
if not Path(uri).is_absolute() and base_path:
uri = str(Path(base_path) / uri)
return str(Path(uri))
Fingerprints may differ when file paths, tool versions, or code formatting change between runs. Use a content-based fingerprint as fallback:
import hashlib
def compute_stable_fingerprint(result: dict, file_content: str = None) -> str:
components = [result.get("ruleId", ""), result.get("message", {}).get("text", "")[:100]]
if file_content and result.get("locations"):
region = result["locations"][0].get("physicalLocation", {}).get("region", {})
if region.get("startLine"):
lines = file_content.split("\n")
idx = region["startLine"] - 1
if 0 <= idx < len(lines):
components.append(lines[idx].strip())
return hashlib.sha256("".join(components).encode()).hexdigest()[:16]
SARIF allows many optional fields. Use defensive access:
def safe_get_location(result: dict) -> tuple[str, int]:
try:
loc = result.get("locations", [{}])[0]
phys = loc.get("physicalLocation", {})
return (phys.get("artifactLocation", {}).get("uri", "unknown"),
phys.get("region", {}).get("startLine", 0))
except (IndexError, KeyError, TypeError):
return "unknown", 0
For 100MB+ SARIF files, stream instead of loading entirely:
import ijson # pip install ijson
def stream_results(sarif_path: str):
with open(sarif_path, "rb") as f:
for result in ijson.items(f, "runs.item.results.item"):
yield result
Validate structure before processing:
# ajv-cli
ajv validate -s sarif-schema-2.1.0.json -d results.sarif
# Python
pip install jsonschema
from jsonschema import validate, ValidationError
import json
def validate_sarif(sarif_path: str, schema_path: str) -> bool:
with open(sarif_path) as f:
sarif = json.load(f)
with open(schema_path) as f:
schema = json.load(f)
try:
validate(sarif, schema)
return True
except ValidationError as e:
print(f"Validation error: {e.message}")
return False
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
- name: Check for high severity
run: |
HIGH_COUNT=$(jq '[.runs[].results[] | select(.level == "error")] | length' results.sarif)
if [ "$HIGH_COUNT" -gt 0 ]; then
echo "Found $HIGH_COUNT high severity issues"
exit 1
fi
from sarif import loader
def check_for_regressions(baseline: str, current: str) -> int:
baseline_fps = {get_fingerprint(r) for r in loader.load_sarif_file(baseline).get_results()}
new_issues = [r for r in loader.load_sarif_file(current).get_results()
if get_fingerprint(r) not in baseline_fps]
return len(new_issues)
development
Native Web UI structured renderer schemas for compose-block drafts, search-results cards, dataframe tables, chart-json charts, and diff output
tools
Unified search hub. Route any web/real-time/X lookup through a 4-tier escalation: built-in web search → cli-jaw browser CDP → progrok Grok OAuth → web-ai (Grok Expert / GPT Pro). Use for: search, 검색, web search, latest news, real-time info, X/Twitter, fact lookup, deep research.
development
UI/UX intent discovery, design vocabulary, product personalities, UX state patterns, typography line break judgment, favicon/product logo design, and logo trust section design. Use when user design direction is vague, when building onboarding/empty/error states, when setting up favicons or product logos, or when referencing a product aesthetic.
development
Canonical owner of module boundary rules, circular dependency detection/prevention, implicit coupling taxonomy, barrel/re-export discipline, and boundary-only defensive programming. Referenced by dev, dev-code-reviewer, dev-backend, dev-frontend stubs.