skills/beam/beam-tools/evaluate-solutions-case-study/SKILL.md
Load when user says 'evaluate case study', 'review candidate submission', 'analyze case study', 'candidate strengths and weaknesses', 'assess candidate work', or 'evaluate solutions engineer candidate'. Systematically evaluate Solutions AI Engineer candidate case study submissions by analyzing extraction prompt quality, project plan depth, and presentation/communication.
npx skillsauth add beam-ai-team/beam-next-skills evaluate-solutions-case-studyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematically identify strengths and weaknesses in Solutions AI Engineer candidate case study submissions.
Evaluate candidate submissions for the Solutions AI Engineer role by analyzing three key components: extraction prompt quality, project plan depth, and presentation effectiveness. Produces a structured strengths/weaknesses analysis to support consistent hiring decisions.
Time Estimate: 15-25 minutes per candidate
Before evaluating any candidate submission, load these reference materials to understand what the candidate was asked to do:
Case Study Requirements: case-study-materials/case-study-requirements.md
Test Dataset: case-study-materials/case-study-dataset/Mid Level AI Solution Engineer Case Study/
Test Dataset...csv - Contains emails and expected extraction resultsBauerTech-GmbH.pdfHofbauer-Eng-GmBH.pdfMller-Industrial-GmBH.pdfNORSE-tech-GmBH.PDFNT-Precision-GmBH.pdfSchneider-Metallbau-GmbH.pdfSteinbach-GmBH.pdfTechnovac-GmBH.pdfWeber-Tools-GmbH.pdfZMT-Precision-GmBH.pdfFull Brief (optional): case-study-materials/Solutions-Case-Study.pdf
The candidate's prompt must extract these fields from German B2B order documents:
Buyer (3 fields):
buyer_company_namebuyer_person_namebuyer_email_addressOrder (5 fields):
order_numberorder_date (German format: DD.MM.YYYY)delivery_address_streetdelivery_address_postal_codedelivery_address_cityProduct (3 fields, repeated per line item):
product_positionproduct_article_codeproduct_quantityNote: Orders contain 1-10 products each. A complete prompt must handle multiple products per order.
| Score | Meaning | |-------|---------| | 5 | Exceptional - Exceeds expectations, demonstrates mastery | | 4 | Strong - Fully meets expectations, minor improvements possible | | 3 | Acceptable - Meets basic expectations, some gaps | | 2 | Weak - Below expectations, significant gaps | | 1 | Poor - Does not meet expectations, major issues |
1. Prompt Engineering (from Task 1) | Criterion | What to Score | |-----------|---------------| | Field Coverage | Does the prompt address all 11 required fields? Handles multi-product orders? | | Structure & Clarity | Clear sections, logical organization, unambiguous instructions | | Output Format Specification | Defines exact output structure, handles edge cases (empty fields, special characters) | | Example Quality | Provides input/output examples, demonstrates expected behavior | | Prompt Engineering Best Practices | System/user role separation, appropriate constraints, robust to variation |
2. Solution Design (from Task 1 thinking + Task 2) | Criterion | What to Score | |-----------|---------------| | Architecture Understanding | Grasps how LLM extraction fits into larger automation pipeline | | Error Handling & HITL | Plans for failures, confidence thresholds, human review points | | Scalability Thinking | Considers volume growth, new document formats, maintenance | | UAT & Validation Approach | How to test accuracy, acceptance criteria, feedback loops | | Technical Realism | Feasible approach, aware of LLM limitations, no magic thinking |
3. Project Management (from Task 2) | Criterion | What to Score | |-----------|---------------| | Phase Structure | Clear milestones, logical sequencing, dependencies identified | | Timeline Realism | Reasonable estimates, accounts for iteration and review cycles | | Risk Identification | Anticipates what could go wrong, has mitigation strategies | | Stakeholder Touchpoints | Built-in client communication, approval gates, expectation management | | Resource & Scope Clarity | Clear on what's needed, explicit assumptions and trade-offs |
4. Communication & Sales (from Task 3) | Criterion | What to Score | |-----------|---------------| | Narrative Flow | Problem → Solution → Value story, logical progression | | Technical Translation | Explains complex concepts in client-friendly language | | Value Proposition | Clear ROI, business benefits articulated, not just features | | Visual Clarity | Clean slides, appropriate use of diagrams/graphics, not cluttered | | Call to Action | Clear next steps, creates urgency or engagement path |
Category Score (out of 25):
Overall Score (out of 100):
Actions:
Evaluate the extraction prompt using the Prompt Engineering criteria from the Scoring Rubric.
Actions:
Evaluate the project plan using both Solution Design and Project Management criteria from the Scoring Rubric.
Actions:
Evaluate the presentation using the Communication & Sales criteria from the Scoring Rubric.
Actions:
Compile all scores into the final evaluation report using this format:
Report Template:
# Candidate Evaluation: [Name/ID]
**Date**: [Evaluation date]
**Evaluator**: [Your name]
---
## 1. Prompt Engineering [EMOJI] [XX/25]
| Criterion | Score | Reference |
|-----------|-------|-----------|
| Field Coverage | X/5 | [Specific reference in candidate's prompt] |
| Structure & Clarity | X/5 | [Specific reference] |
| Output Format Specification | X/5 | [Specific reference] |
| Example Quality | X/5 | [Specific reference] |
| Prompt Engineering Best Practices | X/5 | [Specific reference] |
---
## 2. Solution Design [EMOJI] [XX/25]
| Criterion | Score | Reference |
|-----------|-------|-----------|
| Architecture Understanding | X/5 | [Specific reference in candidate's plan] |
| Error Handling & HITL | X/5 | [Specific reference] |
| Scalability Thinking | X/5 | [Specific reference] |
| UAT & Validation Approach | X/5 | [Specific reference] |
| Technical Realism | X/5 | [Specific reference] |
---
## 3. Project Management [EMOJI] [XX/25]
| Criterion | Score | Reference |
|-----------|-------|-----------|
| Phase Structure | X/5 | [Specific reference in candidate's plan] |
| Timeline Realism | X/5 | [Specific reference] |
| Risk Identification | X/5 | [Specific reference] |
| Stakeholder Touchpoints | X/5 | [Specific reference] |
| Resource & Scope Clarity | X/5 | [Specific reference] |
---
## 4. Communication & Sales [EMOJI] [XX/25]
| Criterion | Score | Reference |
|-----------|-------|-----------|
| Narrative Flow | X/5 | [Specific reference in candidate's slides] |
| Technical Translation | X/5 | [Specific reference] |
| Value Proposition | X/5 | [Specific reference] |
| Visual Clarity | X/5 | [Specific reference] |
| Call to Action | X/5 | [Specific reference] |
---
## Overall Score: [EMOJI] [XX/100]
| Category | Score |
|----------|-------|
| Prompt Engineering | XX/25 |
| Solution Design | XX/25 |
| Project Management | XX/25 |
| Communication & Sales | XX/25 |
| **Total** | **XX/100** |
Emoji Reference:
development
--- name: taste-skill type: skill version: '1.0' author: Leonxlnx (packaged by Zhichao Li) category: general tags: - frontend - design - anti-slop - landing-page updated: '2026-06-11' visibility: public description: Anti-slop frontend skill for landing pages, portfolios, and redesigns. The agent reads the brief, infers the right design direction, and ships interfaces that do not look templated. Real design systems when applicable, audit-first on redesigns, strict pre-flight check. license: MIT.
development
Use when communicating quantitative information in any form — Slack updates, emails, reports, decks, dashboards, landing pages, product UI, public talks. Covers two integrated layers: (1) making numbers semantically meaningful (translation, anchoring, simplification, story-pairing) and (2) showing numbers cleanly (chart vs table vs prose, chart-by-message, pre-attentive emphasis, color discipline, decluttering). Distilled and integrated from *Show Me the Numbers* (Stephen Few) and *Make Numbers Count* (Chip Heath & Karla Starr). Not for raw data analysis or statistics — this is about communication of numbers, not their derivation.
development
Use when the user wants to design, redesign, shape, critique, audit, polish, clarify, distill, harden, optimize, adapt, animate, colorize, extract, or otherwise improve a frontend interface. Covers websites, landing pages, dashboards, product UI, app shells, components, forms, settings, onboarding, and empty states. Handles UX review, visual hierarchy, information architecture, cognitive load, accessibility, performance, responsive behavior, theming, anti-patterns, typography, fonts, spacing, layout, alignment, color, motion, micro-interactions, UX copy, error states, edge cases, i18n, and reusable design systems or tokens. Also use for bland designs that need to become bolder or more delightful, loud designs that should become quieter, live browser iteration on UI elements, or ambitious visual effects that should feel technically extraordinary. Not for backend-only or non-UI tasks.
tools
Stateful multi-session tutor adapted for Beam — teach a stakeholder to understand, trust, and operate a specific agent, or teach a Solution Engineer a client's business process for delivery. Grounds every lesson in Knowledge Hub sources (real agent graphs, real tasks, transcripts, Linear) before any web resource. Also works for any general topic. Trigger on "teach me", "beam teach", "教我", "onboard <person> on <agent>", "help <stakeholder> understand the agent", "learn this client's process".