skills/q-scholar/q-multimodal/SKILL.md
Extract visual, video, and audio features from media. Use for pixel features (Pillow), video frames (FFmpeg+Pillow), audio features (openSMILE), and visual semantic analysis (Gemini API batch or standard).
npx skillsauth add TyrealQ/q-skills q-multimodalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Multimodal media analysis: local low-level features (Pillow, openSMILE), mid/high-level visual semantic analysis (Gemini API). Local pipelines are fully generic and CLI-driven. Gemini pipelines are config-driven: copy scripts/gemini/pipeline_config.py to your project, customize, and run with --config <path>.
Do this once when adopting the skill in a new project. The canonical layout is a target only for scripts/ and output/ — user assets (input data, media, .env, system prompt) can stay wherever they already live; point the scripts at them via absolute paths.
<BASE_DIR>: read the project's CLAUDE.md if it exists. If BASE_DIR isn't defined, ask the user which directory is the project root..env with GOOGLE_API_KEY1-4 (Gemini only; check project root, home directory, common locations)pipeline_config.py fields or CLI --input / --base-dir arguments to the absolute paths you found. Never move user data without explicit confirmation.scripts/ and output/ under <BASE_DIR>. Copy the pipelines actually being used from ${SKILL_DIR}/scripts/ into <BASE_DIR>/scripts/:
pillow/, opensmile/, common.pygemini/batch/, gemini/standard/, gemini/pipeline_config.py (template → adapt in place or copy to <BASE_DIR>/scripts/pipeline_config.py)output/ is auto-created by scripts on first runpython -c "import pandas as pd; print(list(pd.read_<FORMAT>('INPUT', nrows=1).columns))"--id-cols with the user. By default only the identifier column (from --file-col) is kept; the user may want additional columns carried through to the output.Read the relevant reference file before executing a pipeline. These contain all flags, output column definitions, edge cases, and validation rules.
Local pipelines:
references/image-visual-features.md — all feature categories, column definitions, computation notesreferences/video-visual-features.md — frame extraction, aggregation logic, dual output formatreferences/audio-features.md — openSMILE feature sets, interpretable scores, feature levelsGemini pipelines:
references/gemini-batch-workflow.md — full 6-step batch pipeline, retry workflow, error handlingreferences/gemini-standard.md — standard pipeline details, model config, adapting for new projectsreferences/multi-key-management.md — multi-key quota strategy, retry threshold decision tableShared:
references/checkpoint-format.md — column order, validation rules, output directory structure| Pipeline | Python packages | System |
|----------|----------------|--------|
| Image visual | Pillow, numpy, pandas, tqdm, openpyxl | — |
| Video visual | (same as image) + scenedetect[opencv] | ffmpeg on PATH (for --extractor ffmpeg) |
| Audio | opensmile, pandas, tqdm, openpyxl | ffmpeg on PATH |
| Gemini | google-genai, python-dotenv (+ above) | .env with GOOGLE_API_KEY1-4 |
Script path = ${SKILL_DIR}/scripts/<path>. Read the pipeline's reference file before running.
| Script | Input | Output | Reference |
|--------|-------|--------|-----------|
| pillow/visual_features.py | Images | 47 pixel features (color, texture, spatial, quality) | image-visual-features.md |
| pillow/video_features.py | Videos | Frame-level + video-level aggregated features (scene-based extraction by default, FFmpeg fixed-interval optional) | video-visual-features.md |
| opensmile/audio_features.py | Video/audio | 8 interpretable scores + raw openSMILE features | audio-features.md |
Shared utilities: common.py — read_input(), save_excel(), derive_subject(), merge_checkpoints()
Command pattern: python <script> --input <file> --base-dir <root> [--features ...] [--id-cols ...] [--subjects ...] [--preview] [--merge]
Both pipelines read a pipeline_config.py file that defines paths, schema, metadata formatting, and validation rules. Copy scripts/gemini/pipeline_config.py to your project and customize.
Standard (gemini/standard/gemini_standard.py): inline media, 25 workers, auto-retry. See gemini-standard.md.
Batch (gemini/batch/[0-5]*.py + utils.py): 6-step pipeline, 50% discount. URIs expire after 48 hours. See gemini-batch-workflow.md.
python 0uploadMedia.py --config /path/to/config.py --submit --max-batch-gb 2 --key 1
python 3checkStatus.py --config /path/to/config.py --poll
python 4retryErrors.py --config /path/to/config.py --preview
# >500 failures: batch retry
python 4retryErrors.py --config /path/to/config.py --submit
python 3checkStatus.py --config /path/to/config.py --poll
python 4retryErrors.py --config /path/to/config.py --collect
# <=500 failures or after batch retries: live fallback
python 4retryErrors.py --config /path/to/config.py --standard
python 5review.py --config /path/to/config.py --merge
Decision: >5 GB or >10 subjects and not time-sensitive → batch. Otherwise → standard. See gemini-standard.md.
Multi-key: Each GOOGLE_API_KEY{N} = 20 GB quota. See multi-key-management.md.
Local pipelines (Pillow, openSMILE): No modification needed. All project-specific values come from CLI args.
Gemini pipelines: Config-driven, no script modification needed. Scripts are copied to the project in step 2 above, then:
<BASE_DIR>/scripts/pipeline_config.py (already copied from template in step 2)BASE_DIR, INPUT_PATH, SYSTEM_PROMPT_PATH to your project paths (SYSTEM_PROMPT_PATH relative to BASE_DIR, e.g., scripts/<prompt>.txt)GROUP_COL, FILE_COL, ANALYSIS_FIELDS to match your input schema and system promptsubject_id() and format_metadata() for your domainvalidate_row() for field-specific validation rules--config <BASE_DIR>/scripts/pipeline_config.pyInclude: Image/video/audio feature extraction, Gemini visual semantic analysis, batch job management, checkpoint merging, multi-key quota management.
Exclude: ML model training, deep learning inference, real-time streaming analysis.
--id-cols and --features with user--preview dry run confirms expected subjects and counts.env with API keys, system prompt file createdtesting
Capture session decisions, conventions, and lessons into plan files, auto-memory, and CLAUDE.md so a fresh session resumes cleanly. Use for "hand off," "wrap up," "update docs for next session," or before /compact.
data-ai
Run exploratory data analysis on tabular datasets with measurement-appropriate statistics. Use for EDA, descriptive statistics, data exploration, or preparing data summaries for reports and manuscripts.
research
Orchestrate end-to-end academic manuscript preparation following APA 7th edition. Use for writing papers, drafting sections, or academic writing support.
development
Generate professional slide deck images from content with smart logo branding. Use for creating slides, presentations, decks, or PPT output.