skills/beam/beam-tools/select-llm-model/SKILL.md
Select optimal LLM model for Beam AI tools based on cost, context, and performance. Load when user says "select llm model", "choose model", "optimize llm", "which model for", "model recommendation", "best model for agent", "llm cost optimization", or needs help selecting the right AI model for their Beam tools.
npx skillsauth add beam-ai-team/beam-next-skills select-llm-modelInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find the most cost-effective LLM model for your Beam AI tools without sacrificing reliability.
Required: Your Beam.ai personal API key
Add to .env file at project root:
# Your Beam.ai Personal API Key (from Beam Settings → API)
BEAM_API_KEY=your_personal_api_key_here
# Workspace ID is extracted from task URLs automatically
# Or set it manually if preferred:
# BEAM_WORKSPACE_ID=your_workspace_id
Don't have an API key? Get it from: Beam Settings → API → Create Personal API Key
No prerequisites - just provide your prompt and examples directly.
Dependencies: pip install requests python-dotenv
Beam uses a two-step authentication:
curl -X POST https://api.beamstudio.ai/auth/access-token \
-H "Content-Type: application/json" \
-d '{"apiKey": "YOUR_PERSONAL_API_KEY"}'
Response:
{
"idToken": "eyJhbG...", // Use this as Bearer token
"refreshToken": "eyJhbG...",
"expiresIn": 600 // Token lasts 10 minutes
}
curl -X GET "https://api.beamstudio.ai/agent-tasks/{TASK_ID}" \
-H "Authorization: Bearer {idToken}" \
-H "current-workspace-id: {WORKSPACE_ID}"
Workspace ID: First UUID in the task URL
Example: https://app.beam.ai/0853f6d5-c912-4a47-8101.../tasks/...
Workspace ID = 0853f6d5-c912-4a47-8101-c965c282d46f
Ask the user which input method they prefer:
Option A: Beam Task URL/ID
https://app.beam.ai/{workspace_id}/{agent_id}/tasks/{task_id}783221fe-fdde-4086-987e-544ca216a31capp.beam.ai//tasks/Option B: Manual Input
{{input_data}})If using Beam Task:
ACCESS_TOKEN=$(curl -s -X POST "https://api.beamstudio.ai/auth/access-token" \
-H "Content-Type: application/json" \
-d '{"apiKey": "YOUR_API_KEY"}' | jq -r '.idToken')
curl -s -X GET "https://api.beamstudio.ai/agent-tasks/{TASK_ID}" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "current-workspace-id: {WORKSPACE_ID}" \
-o /tmp/beam_task_{TASK_ID}.json
agentTaskNodes and filter by type:| Tool Type | originalTool.type | Action |
|-----------|---------------------|--------|
| custom_gpt_tool | Custom LLM tool | Analyze |
| gpt_tool | Built-in LLM tool | Analyze |
| beam_tool | System tool | Skip |
| custom_integration_tool | API integration | Skip |
For each LLM tool, extract from task JSON using these correct paths:
| Data Point | JSON Path |
|------------|-----------|
| Tool Name | agentTaskNodes[].agentGraphNode.toolConfiguration.toolName |
| Template Prompt | agentTaskNodes[].agentGraphNode.toolConfiguration.originalTool.prompt |
| Filled Prompt | agentTaskNodes[].toolData.filled_prompt |
| Output | agentTaskNodes[].output.value |
| Current Model | agentTaskNodes[].agentGraphNode.toolConfiguration.originalTool.preferredModel |
Important: Use filled_prompt for token calculation - this is what actually gets sent to the LLM (template + substituted variable values).
Calculate tokens for each tool:
Input Tokens = len(filled_prompt) / 3.5
Output Tokens = len(json.dumps(output)) / 3.5
Total Context = Input + Output
Buffer = Total × 1.2 (20% safety margin)
Token estimation by content type: | Content Type | Chars per Token | |--------------|-----------------| | JSON/Code | ~3.5 | | Plain English | ~4.0 | | Mixed content | ~3.7 |
Don't just recommend the cheapest model that fits the context. Assess task complexity to ensure quality:
| Level | Characteristics | Recommended Model | |-------|-----------------|-------------------| | LOW | Boolean output, simple ID consolidation | GPT-4o-mini | | LOW-MEDIUM | Data extraction (accuracy-critical) | Gemini 3 Flash | | MEDIUM | Classification, basic analysis, structured reasoning | Gemini 3 Flash | | MEDIUM-HIGH | Pattern recognition, sentiment analysis, multi-factor reasoning | Gemini 3 Flash | | HIGH | Customer-facing content generation, complex reasoning, nuanced writing | Gemini 3 Flash or GPT-5 |
Analyze the prompt and output to classify:
| Task Type | Prompt Keywords | Output Type | Complexity | Notes | |-----------|-----------------|-------------|------------|-------| | Data Extraction | "extract", "parse", "get fields" | JSON fields | LOW-MEDIUM | Use Gemini 3 Flash - GPT-4o-mini has accuracy issues | | Consolidation | "consolidate", "combine", "merge" | Lists, aggregated data | LOW | GPT-4o-mini OK for simple lists | | Classification | "classify", "categorize", "determine" | Category/label | MEDIUM | Gemini 3 Flash recommended | | Evaluation | "evaluate", "compare", "assess" | Boolean or score | LOW | GPT-4o-mini OK for yes/no | | Analysis | "analyze", "identify patterns", "risk factors" | Rich insights | MEDIUM-HIGH | Gemini 3 Flash | | Content Generation | "generate email", "write", "compose" | Customer-facing text | HIGH | Gemini 3 Flash or GPT-5 |
Key Learning: GPT-4o-mini is cheap but may miss extraction details. For business-critical data extraction, use Gemini 3 Flash - it's only ~$0.50/$3.00 per 1M tokens and significantly more accurate.
| Scenario | GPT-4o-mini | Gemini 3 Flash | |----------|-------------|----------------| | Boolean decisions | OK | Overkill | | ID list consolidation | OK | Overkill | | Field extraction from structured data | Accuracy issues | Recommended | | Multi-field extraction | Not recommended | Recommended | | Pattern analysis | Not recommended | Recommended |
Examine the actual output to verify complexity:
| Output Type | Example | Safe for GPT-4o-mini? |
|-------------|---------|----------------------|
| Boolean | {"should_update": false} | YES |
| ID List | {"ticket_ids": "123, 456, 789"} | YES |
| Simple JSON (1-2 fields) | {"status": "active"} | MAYBE - test first |
| Multi-field extraction | {"name": "...", "email": "...", "date": "..."} | NO - use Gemini 3 Flash |
| Rich Analysis | {"risk_factors": [...], "sentiment": "..."} | NO - use Gemini 3 Flash |
| Generated Text | {"email_body": "Dear customer..."} | NO - use better model |
For each tool, provide:
TOOL: {tool_name}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Current Model: {current_model}
Input Tokens: {input_tokens}
Output Tokens: {output_tokens}
Total Context: {total_tokens}
Task Analysis:
Type: {extraction|consolidation|classification|analysis|generation}
Complexity: {LOW|MEDIUM|MEDIUM-HIGH|HIGH}
Output: {description of output type}
RECOMMENDED: {model_name}
Confidence: {HIGH|MEDIUM|LOW}
Reason: {why this model fits the task}
Cost: ${cost}/1K calls
Current vs Recommended:
Current: ${current_cost}/1K
Recommended: ${new_cost}/1K
Savings: {percentage}%
Present final recommendations in a table:
| Tool | Tokens (in/out) | Complexity | Recommended | Confidence | Cost/1K | Savings | |------|-----------------|------------|-------------|------------|---------|---------| | Tool A | 1,000 / 200 | LOW | GPT-4o-mini | HIGH | $0.27 | 94% | | Tool B | 76,000 / 600 | MEDIUM-HIGH | Gemini 3 Flash | MEDIUM | $40.00 | 83% |
Ask user:
"Would you like me to delete the temporary task data file? (y/n)"
If yes: rm /tmp/beam_task_<task_id>.json
| Finding | Source | |---------|--------| | Gemini 3 Flash has 15% better extraction accuracy than Gemini 2.5 Flash on hard tasks | Artificial Analysis | | Gemini 3 Flash competes with GPT-5.2 in benchmarks; GPT-4o-mini is 17 months older | Engadget | | For document extraction, Gemini Flash "extracts the most detailed information and achieves the best accuracy" | Medium - UnderDoc | | GPT-4o-mini hallucination rate: ~1.69% on simple tasks, but higher on extraction | All About AI | | Gemini 3 Flash: 218 tokens/sec vs GPT-4o-mini: 85 tokens/sec | DocsBot |
| Task Type | Best Model | Why | Cost/1M (in/out) | |-----------|------------|-----|------------------| | Boolean decisions | GPT-4o-mini | Simple, cheap, sufficient | $0.15/$0.60 | | ID list consolidation | GPT-4o-mini | Simple aggregation | $0.15/$0.60 | | Data extraction | Gemini 3 Flash | 15% better accuracy, handles complex docs | $0.50/$3.00 | | Classification | Gemini 3 Flash | Good reasoning at low cost | $0.50/$3.00 | | Pattern analysis | Gemini 3 Flash | Strong on complex reasoning | $0.50/$3.00 | | Content generation | GPT-5 or Gemini 3 Flash | Best writing quality | $1.25/$10 or $0.50/$3.00 | | Complex reasoning | GPT-5.2 or Gemini 3 Pro | Highest benchmark scores | $1.75/$14 or $3.00/$15 | | Very large context (>200K) | Gemini 2.5 Pro or GPT-4.1 | 1M context window | $1.25/$10 or $2.00/$8.00 |
Example: Data Extraction Tool (946 input / 210 output tokens)
| Model | Cost per 1K calls | Accuracy | |-------|-------------------|----------| | GPT-4o-mini | $0.27 | Lower (fields missed) | | Gemini 3 Flash | $1.10 | Higher (15%+ improvement) |
Delta: $0.83 per 1K calls for significantly better accuracy - worth it for business-critical extraction.
GPT-4o-mini is 17 months older than Gemini 3 Flash (Oct 2023 vs Jan 2025 training data). Gemini 3 Flash now competes with GPT-5.2 level performance at Flash pricing, making it the best value for most tasks except simple boolean/list operations.
| Complexity | Primary Choice | Alternative | Avoid | |------------|----------------|-------------|-------| | LOW (boolean, ID lists) | GPT-4o-mini | Gemini 3 Flash | Opus, GPT-5 (overkill) | | LOW-MEDIUM (extraction) | Gemini 3 Flash | GPT-5 | GPT-4o-mini (accuracy issues) | | MEDIUM | Gemini 3 Flash | GPT-5 | GPT-4o-mini | | MEDIUM-HIGH | Gemini 3 Flash | GPT-5 | GPT-4o-mini (quality risk) | | HIGH | Gemini 3 Flash | GPT-5, Claude 4.5 Sonnet | GPT-4o-mini (quality risk) |
Only use GPT-4o-mini for:
should_update: true/false)Do NOT use GPT-4o-mini for:
| Context Need | Best Options | |--------------|--------------| | <50K tokens | GPT-4o-mini, Gemini 3 Flash | | 50K-128K | GPT-4o-mini, Gemini 3 Flash, Claude 4.5 Sonnet | | 128K-200K | Claude 4.5 Sonnet, GPT-5, Gemini 2.5 Pro | | 200K-1M | GPT-4.1, Gemini 2.5 Pro, Gemini 3 Pro |
| Model | Input/1M | Output/1M | Context | Best For | |-------|----------|-----------|---------|----------| | GPT-4o-mini | $0.15 | $0.60 | 128K | Low complexity, high volume | | Gemini 3 Flash | $0.50 | $3.00 | 1M | Medium-high complexity, fast | | GPT-5 | $1.25 | $10.00 | 400K | High complexity, best value | | Claude 4.5 Sonnet | $3.00 | $15.00 | 200K | Balanced, caching available | | Gemini 3 Pro | $3.00 | $15.00 | 1M | Large context analysis |
| Beam ID | Model Name |
|---------|------------|
| GPT4O_MINI | GPT-4o-mini |
| GPT40 | GPT-4o |
| GPT41 | GPT-4.1 |
| GPT5 | GPT-5 |
| GPT52 | GPT-5.2 |
| GEMINI_3_FLASH | Gemini 3 Flash |
| GEMINI_25_PRO | Gemini 2.5 Pro |
| GEMINI_3_PRO | Gemini 3 Pro |
| CLAUDE_35_SONNET | Claude 3.5 Sonnet |
| CLAUDE_45_SONNET | Claude 4.5 Sonnet |
| CLAUDE_45_OPUS | Claude 4.5 Opus |
/tasks/ in URL).api.beamstudio.ai), not BID staging.beam-get-task-details - Fetch task data from Beamdesign-beam-agent - Design new agent architecturebeam-get-agent-analytics - Analyze agent performanceSee references/llm-models.md for complete specifications including:
development
--- name: taste-skill type: skill version: '1.0' author: Leonxlnx (packaged by Zhichao Li) category: general tags: - frontend - design - anti-slop - landing-page updated: '2026-06-11' visibility: public description: Anti-slop frontend skill for landing pages, portfolios, and redesigns. The agent reads the brief, infers the right design direction, and ships interfaces that do not look templated. Real design systems when applicable, audit-first on redesigns, strict pre-flight check. license: MIT.
development
Use when communicating quantitative information in any form — Slack updates, emails, reports, decks, dashboards, landing pages, product UI, public talks. Covers two integrated layers: (1) making numbers semantically meaningful (translation, anchoring, simplification, story-pairing) and (2) showing numbers cleanly (chart vs table vs prose, chart-by-message, pre-attentive emphasis, color discipline, decluttering). Distilled and integrated from *Show Me the Numbers* (Stephen Few) and *Make Numbers Count* (Chip Heath & Karla Starr). Not for raw data analysis or statistics — this is about communication of numbers, not their derivation.
development
Use when the user wants to design, redesign, shape, critique, audit, polish, clarify, distill, harden, optimize, adapt, animate, colorize, extract, or otherwise improve a frontend interface. Covers websites, landing pages, dashboards, product UI, app shells, components, forms, settings, onboarding, and empty states. Handles UX review, visual hierarchy, information architecture, cognitive load, accessibility, performance, responsive behavior, theming, anti-patterns, typography, fonts, spacing, layout, alignment, color, motion, micro-interactions, UX copy, error states, edge cases, i18n, and reusable design systems or tokens. Also use for bland designs that need to become bolder or more delightful, loud designs that should become quieter, live browser iteration on UI elements, or ambitious visual effects that should feel technically extraordinary. Not for backend-only or non-UI tasks.
tools
Stateful multi-session tutor adapted for Beam — teach a stakeholder to understand, trust, and operate a specific agent, or teach a Solution Engineer a client's business process for delivery. Grounds every lesson in Knowledge Hub sources (real agent graphs, real tasks, transcripts, Linear) before any web resource. Also works for any general topic. Trigger on "teach me", "beam teach", "教我", "onboard <person> on <agent>", "help <stakeholder> understand the agent", "learn this client's process".