helpers/skills/python-packaging-binary-audit/SKILL.md
Scan a Python package repository for compiled/binary files using Fromager-style detection and malcontent YARA analysis, then triage findings with deterministic rules and AI reasoning to produce a structured risk report section.
npx skillsauth add opendatahub-io/ai-helpers python-packaging-binary-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Scans a Python package repository for compiled or binary files using Fromager-style extension and magic-header detection, then runs malcontent YARA-based analysis on any detected binaries. Produces a self-contained "Binary Scan" report section with triaged findings and a risk assessment.
RISK_RATING:<value> so
the orchestrator can parse it without reading the full report.Run the binary scanner to find compiled files using Fromager-style extension and magic-header detection:
STAGING_DIR=$(mktemp -d -t malcontent-staging-XXXXXX)
./scripts/scan_binaries.py --stage-to "$STAGING_DIR" "<repo-path>"
This outputs JSON to stdout with total and findings fields. Each finding has:
path, match_type (extension or magic_header), suffix, size, and optionally
magic (ELF, MachO, ar_archive, etc.).
The --stage-to flag copies detected binaries into a staging directory preserving
relative paths for malcontent analysis in the next step.
If the scanner finds zero binaries, skip to the Output section and note "No binary files detected" in the report.
Run malcontent analysis on the staged binaries:
./scripts/run_malcontent.py "$STAGING_DIR"
malcontent_exit=$?
Check the exit code before proceeding:
mal) is not installed. Do not fail. Proceed to
triage using only the binary scan metadata (extension, magic header, size).
Note "malcontent unavailable" in the report output.When malcontent is unavailable or fails, the binary scan findings alone still provide value — file paths, types, and sizes are sufficient for the deterministic triage rules that do not depend on malcontent risk levels.
Review binary findings in context. Read relevant source files to understand the purpose of detected binaries. Triage proceeds in two stages: deterministic rules first, then AI reasoning for anything unresolved.
Apply the following rules before any AI reasoning. These handle the most common clear-cut cases and make the triage reproducible.
| Condition | Verdict |
|-----------|---------|
| Binary is under third_party/, vendor/, or extern/ and malcontent risk ≤ medium | PASS — vendored dependency |
| Binary is under test/, tests/, benchmarks/, or examples/ and malcontent risk ≤ medium | PASS — test data |
| Binary suffix is .ptx, .cubin, .fatbin, or path contains triton/ or cuda | PASS — GPU kernel |
| Malcontent risk is critical | BLOCK |
| Malcontent flags remote_access, exfiltration, or backdoor capabilities | BLOCK |
| Binary has no malcontent findings and is only detected by extension/magic header | PASS — opaque-only |
| Malcontent timed out | REVIEW — partial results, manual inspection recommended |
When multiple findings produce different verdicts, the overall precedence is BLOCK > REVIEW > PASS — the most severe verdict wins.
Any finding not resolved by Stage 1 proceeds to Stage 2.
For findings that remain unresolved after deterministic rules, classify each as:
Remove the staging directory when analysis is complete:
if [ -n "${STAGING_DIR}" ] && [ -d "${STAGING_DIR}" ]; then
rm -rf -- "${STAGING_DIR}"
fi
Produce the following markdown section:
## Binary Scan
**Binaries detected:** {N}
**Malcontent status:** {ran successfully | unavailable — findings based on binary scan only | timed out — partial results}
### BLOCK Findings
| File | Type | Size | Malcontent Risk | Capabilities | Triage |
|------|------|------|-----------------|--------------|--------|
| src/lib/backdoor.so | ELF | 24KB | critical | remote_access, exfiltration | BLOCK — critical risk with network capabilities |
### REVIEW Findings
(same table format)
### PASS Findings
(same table format, brief — included for completeness but de-emphasized)
The risk_rating for this phase is one of:
If output_file is provided, write the file with the first line as
RISK_RATING:<value> followed by a blank line and then the markdown section
above. If output_file is not provided, return the report section inline.
| Scenario | Behavior | |----------|----------| | Binary scan finds zero binaries | Note "No binary files detected", risk_rating = no_issues | | Malcontent unavailable (exit code 2) | Triage binary findings using scan metadata only (extension, magic header, size); note malcontent was unavailable | | Malcontent times out (exit code 1) | Report partial results; note timeout; REVIEW verdict for affected binaries | | Malcontent produces invalid JSON (exit code 1) | Triage binary findings using scan metadata only; note malcontent output error |
development
Run hexora static analysis on a Python package repository to detect suspicious code patterns, then triage findings with deterministic rules and AI reasoning to produce a structured risk report section.
development
Inspect recent git history of a Python package repository for suspicious commits touching supply-chain-sensitive files, then triage findings with AI reasoning to produce a structured risk report section.
testing
Use this skill to identify non-Red Hat RPM packages installed in container images or on the local machine. For containers, pulls images across multiple architectures and release tags; for local scans, inspects the host directly. Extracts RPM signing metadata and reports packages not signed with the Red Hat GPG key as CSV output. Use when auditing compliance, checking supply-chain provenance, or scanning for third-party RPMs in RHOAI component images.
development
Sync code from an upstream GitHub repository into a target fork (e.g., opendatahub-io midstream). Detects remotes from the current repo, or clones fresh if run from outside. Fetches upstream, merges into a sync branch, restores protected files, resolves conflicts, and opens a PR to the target GitHub repo. Use when asked to sync upstream, merge upstream changes, or bring a GitHub fork up to date with its upstream source.