skills/alignagent-adaptive-learner-intelligence/SKILL.md
Build multi-agent adaptive learning systems that diagnose knowledge gaps and recommend targeted resources. Implements the ALIGNAgent framework: Skill Gap Agent (proficiency estimation + concept-level diagnostic reasoning) and Recommender Agent (preference-aware resource retrieval aligned to deficiencies). Trigger phrases: - "Build an adaptive learning system" - "Create a personalized tutoring agent" - "Diagnose student knowledge gaps from quiz data" - "Build a skill gap analyzer for learners" - "Create an educational recommender that adapts to student performance" - "Implement a multi-agent pipeline for personalized education"
npx skillsauth add ndpvt-web/arxiv-claude-skills alignagent-adaptive-learner-intelligenceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to build multi-agent educational systems that close the loop between assessment, diagnosis, and intervention. Based on the ALIGNAgent framework, the approach processes learner performance data (quiz scores, gradebook records, preferences) through a Skill Gap Agent that computes topic-level proficiency scores and performs concept-level diagnostic reasoning to identify specific misconceptions, then routes deficiencies to a Recommender Agent that retrieves preference-aware learning materials. The key innovation is the continuous feedback loop: interventions occur before advancing to subsequent topics, creating an iterative remediation cycle rather than a one-shot recommendation.
Two-Agent Architecture with Continuous Feedback Loop. ALIGNAgent separates concerns into a Skill Gap Agent and a Recommender Agent connected by a structured handoff. The Skill Gap Agent computes topic-level proficiency as rho_t = (1/|I_t|) * sum(g_i for i in I_t) where I_t is the set of assessment items for topic t and g_i is the normalized score on item i. It then applies a mastery threshold tau to produce a gap set G(s) = {(t, rho_t) | rho_t < tau}, sorted by ascending proficiency so the weakest areas get attention first. Beyond this quantitative layer, the agent performs concept-level diagnostic reasoning -- analyzing patterns in incorrect answers, distractor selection frequency, and cross-item reasoning to produce natural-language explanations of why a learner is struggling (e.g., "confuses AVL rotation direction" rather than just "low score on AVL trees").
Preference-Aware Resource Retrieval. The Recommender Agent constructs structured queries from the gap set and learner preferences (video vs. text, self-paced vs. structured), retrieves candidate resources, validates them against modality preferences and topic relevance, and returns up to K resources per gap. This is diagnostic-driven retrieval -- the query is shaped by the specific misconception, not just the broad topic.
Formative Loop, Not One-Shot. The critical design principle is that the pipeline re-executes after intervention. New assessment data updates the proficiency vector, and topics that remain below threshold trigger fresh recommendations. This makes ALIGNAgent a cycle, not a pipeline -- each iteration refines the learner model.
Define the domain schema. Create a topic taxonomy for your subject area as a structured map: {topic_id: {name, prerequisite_topics[], concept_tags[]}}. Each assessment item must be tagged with its topic and concept(s).
Ingest learner data. Parse three input streams into a unified learner profile: (a) gradebook records with per-item scores normalized to [0, 1], (b) quiz/assessment responses including selected answers (not just correctness), and (c) learner preferences (modality, pacing, prior stated goals).
Compute topic-level proficiency. For each topic t, calculate rho_t = mean(normalized_scores for items tagged with t). Assemble the full proficiency vector rho(s) = [rho_t1, rho_t2, ..., rho_tn].
Classify proficiency levels. Map each rho_t to a categorical label using thresholds: High (rho_t >= 0.8), Medium (0.5 <= rho_t < 0.8), Low (rho_t < 0.5). These thresholds should be configurable.
Identify skill gaps. Apply mastery threshold tau (default: 0.7) to produce the gap set G(s) = {(t, rho_t) | rho_t < tau}. Sort gaps by ascending rho_t so the weakest topics are prioritized.
Run concept-level diagnostic reasoning. For each topic in the gap set, analyze the learner's incorrect responses: which distractors were selected, which concepts those distractors test, and whether errors cluster around a specific misconception. Produce a natural-language diagnostic per gap (e.g., "Consistently confuses DFS with BFS traversal order; correctly identifies base cases but fails on recursive step for graph cycles").
Construct retrieval queries. For each diagnosed gap, build a structured query combining: topic name, specific misconception description, and learner modality preference. Example: {topic: "AVL Trees", misconception: "rotation direction after double rotation", preference: "video tutorial"}.
Retrieve and filter resources. Search for candidate resources matching each query. Filter by: (a) topic relevance to the specific misconception, (b) modality alignment with learner preferences, (c) content quality signals. Return up to K resources per gap, ranked by alignment score.
Generate learner summary. Produce a structured output containing: strengths (topics with High proficiency), gaps (topics below threshold with diagnostic explanations), recommended actions (ordered resource list per gap), and next assessment targets.
Close the feedback loop. After the learner engages with resources and completes a new assessment, re-run steps 3-9 with updated data. Track proficiency delta per topic to measure intervention effectiveness and adjust recommendations accordingly.
Example 1: Analyzing quiz performance in a Python programming course
User: "I have quiz data for 30 students across 5 Python topics. Help me build a system that identifies each student's weak areas and recommends resources."
Approach:
student_id, quiz_id, topic, question_id, score, max_scoreOutput structure:
{
"student_id": "S042",
"proficiency_vector": {
"variables_types": 0.92,
"control_flow": 0.78,
"functions": 0.45,
"data_structures": 0.33,
"file_io": 0.60
},
"proficiency_labels": {
"variables_types": "High",
"control_flow": "Medium",
"functions": "Low",
"data_structures": "Low",
"file_io": "Medium"
},
"skill_gaps": [
{
"topic": "data_structures",
"proficiency": 0.33,
"diagnostic": "Correctly uses lists but fails when indexing nested structures. Confuses dict key access with list indexing. Missed all questions involving dictionary comprehensions.",
"resources": [
{"title": "Python Data Structures - Nested Collections", "type": "video", "url": "..."},
{"title": "Dictionary Comprehension Practice Problems", "type": "interactive", "url": "..."}
]
},
{
"topic": "functions",
"proficiency": 0.45,
"diagnostic": "Understands function definition syntax but struggles with return vs print distinction. Errors cluster around scope-related questions -- treats local variables as global.",
"resources": [
{"title": "Python Scoping Rules Explained", "type": "article", "url": "..."},
{"title": "Functions and Return Values - Step by Step", "type": "video", "url": "..."}
]
}
],
"strengths": ["variables_types"],
"next_assessment": ["data_structures", "functions"]
}
Example 2: Building the multi-agent system in code
User: "Implement the ALIGNAgent two-agent pipeline as a Python module."
Approach:
from dataclasses import dataclass, field
from typing import Optional
import json
@dataclass
class AssessmentItem:
question_id: str
topic: str
concepts: list[str]
score: float # normalized to [0, 1]
selected_answer: str # the actual answer chosen
correct_answer: str
@dataclass
class LearnerProfile:
student_id: str
preferences: dict # e.g., {"modality": "video", "pacing": "self-paced"}
assessments: list[AssessmentItem] = field(default_factory=list)
@dataclass
class SkillGap:
topic: str
proficiency: float
diagnostic: str # natural-language explanation
misconceptions: list[str]
class SkillGapAgent:
def __init__(self, mastery_threshold: float = 0.7, llm_client=None):
self.tau = mastery_threshold
self.llm = llm_client
def compute_proficiency(self, profile: LearnerProfile) -> dict[str, float]:
topic_scores: dict[str, list[float]] = {}
for item in profile.assessments:
topic_scores.setdefault(item.topic, []).append(item.score)
return {t: sum(s) / len(s) for t, s in topic_scores.items()}
def identify_gaps(self, proficiency: dict[str, float]) -> list[tuple[str, float]]:
gaps = [(t, rho) for t, rho in proficiency.items() if rho < self.tau]
return sorted(gaps, key=lambda x: x[1]) # weakest first
def diagnose(self, profile: LearnerProfile, gaps: list[tuple[str, float]]) -> list[SkillGap]:
diagnosed = []
for topic, rho in gaps:
incorrect = [a for a in profile.assessments
if a.topic == topic and a.score < 1.0]
# Use LLM for concept-level diagnostic reasoning
prompt = self._build_diagnostic_prompt(topic, incorrect)
reasoning = self.llm.generate(prompt, temperature=0)
diagnosed.append(SkillGap(
topic=topic, proficiency=rho,
diagnostic=reasoning["explanation"],
misconceptions=reasoning["misconceptions"]
))
return diagnosed
def _build_diagnostic_prompt(self, topic: str, incorrect_items: list) -> str:
items_desc = json.dumps([{
"question_id": i.question_id,
"concepts": i.concepts,
"selected": i.selected_answer,
"correct": i.correct_answer
} for i in incorrect_items], indent=2)
return f"""Analyze these incorrect responses for topic "{topic}".
Identify specific misconceptions by examining which distractors were chosen and what conceptual errors they reveal.
Incorrect responses:
{items_desc}
Return JSON with:
- "explanation": one paragraph describing the pattern of errors
- "misconceptions": list of specific misconception strings"""
class RecommenderAgent:
def __init__(self, max_resources_per_gap: int = 3, search_fn=None):
self.k = max_resources_per_gap
self.search = search_fn
def recommend(self, gaps: list[SkillGap], preferences: dict) -> dict[str, list]:
recommendations = {}
for gap in gaps:
query = self._build_query(gap, preferences)
candidates = self.search(query)
filtered = [r for r in candidates
if self._matches_preference(r, preferences)]
recommendations[gap.topic] = filtered[:self.k]
return recommendations
def _build_query(self, gap: SkillGap, preferences: dict) -> str:
modality = preferences.get("modality", "any")
return f"{gap.topic} {gap.misconceptions[0]} {modality} tutorial"
def _matches_preference(self, resource: dict, preferences: dict) -> bool:
modality = preferences.get("modality")
if not modality or modality == "any":
return True
return resource.get("type", "").lower() == modality.lower()
class ALIGNAgentPipeline:
def __init__(self, mastery_threshold=0.7, max_resources=3,
llm_client=None, search_fn=None):
self.skill_gap_agent = SkillGapAgent(mastery_threshold, llm_client)
self.recommender_agent = RecommenderAgent(max_resources, search_fn)
def run(self, profile: LearnerProfile) -> dict:
proficiency = self.skill_gap_agent.compute_proficiency(profile)
gaps = self.skill_gap_agent.identify_gaps(proficiency)
diagnosed = self.skill_gap_agent.diagnose(profile, gaps)
resources = self.recommender_agent.recommend(diagnosed, profile.preferences)
return {
"student_id": profile.student_id,
"proficiency": proficiency,
"strengths": [t for t, rho in proficiency.items() if rho >= 0.8],
"gaps": [{
"topic": g.topic,
"proficiency": g.proficiency,
"diagnostic": g.diagnostic,
"misconceptions": g.misconceptions,
"resources": resources.get(g.topic, [])
} for g in diagnosed],
"next_assessment_targets": [g.topic for g in diagnosed]
}
Example 3: Retrofitting an existing quiz app with adaptive diagnostics
User: "I have a Flask quiz app that stores results in PostgreSQL. Add adaptive gap analysis after each quiz submission."
Approach:
learner_proficiency table/api/student/<id>/gaps endpoint that returns diagnosed gaps with recommendations# Added to existing Flask app -- new endpoint
@app.route("/api/student/<student_id>/gaps", methods=["GET"])
def get_student_gaps(student_id):
# Fetch assessment history from DB
assessments = db.session.query(QuizResponse).filter_by(
student_id=student_id
).all()
profile = LearnerProfile(
student_id=student_id,
preferences=get_learner_preferences(student_id),
assessments=[to_assessment_item(a) for a in assessments]
)
pipeline = ALIGNAgentPipeline(
mastery_threshold=0.7,
llm_client=openai_client,
search_fn=search_educational_resources
)
result = pipeline.run(profile)
# Store updated proficiency for delta tracking
store_proficiency_snapshot(student_id, result["proficiency"])
return jsonify(result)
tau=0.7 and calibrate by comparing against held-out exam performance.ALIGNAgent: Adaptive Learner Intelligence for Gap Identification and Next-step guidance -- Focus on Section 3 (Framework Architecture), Algorithm 1 (Skill Gap identification), Algorithm 2 (Resource Recommendation), and Tables 1-2 (evaluation results showing 0.87-0.90 precision with GPT-4o agents).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".