skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md
Curated 2024-2026 AI agent research papers collection
npx skillsauth add wentorai/research-plugins ai-agent-papers-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A focused collection of AI agent research papers from 2024-2026, tracking the latest developments in LLM-based agent systems. Unlike broader collections, this focuses on recent breakthroughs — new architectures, benchmarks, multi-agent coordination, and real-world applications. Updated frequently as the field evolves rapidly.
Recent AI Agent Research
├── Agent Architectures
│ ├── Planning (o1-style reasoning, search-augmented)
│ ├── Memory (long-term, episodic, working)
│ └── Tool use (function calling, code execution)
├── Multi-Agent Systems
│ ├── Collaboration (task decomposition, debate)
│ ├── Competition (red team, adversarial)
│ └── Emergence (self-organization, culture)
├── Evaluation
│ ├── Benchmarks (SWE-bench, WebArena, GAIA)
│ ├── Safety (jailbreak, misuse, alignment)
│ └── Reliability (error recovery, hallucination)
├── Applications
│ ├── Software engineering (coding agents)
│ ├── Scientific research (lab automation)
│ ├── Web automation (browsing, form-filling)
│ └── Enterprise (workflow, data analysis)
└── Infrastructure
├── Frameworks (LangGraph, CrewAI, AutoGen)
├── Protocols (MCP, A2A, tool standards)
└── Deployment (scaling, monitoring, cost)
| Paper | Venue | Key Contribution | |-------|-------|-----------------| | SWE-agent | ICLR 2025 | Agent interface design for SE | | OpenHands | 2024 | Open platform for coding agents | | AgentBench | ICLR 2024 | Multi-environment agent benchmark | | GAIA | ICLR 2024 | General AI assistant benchmark | | Voyager | NeurIPS 2024 | Lifelong learning in Minecraft | | OS-Copilot | 2024 | Self-improving computer agent | | AutoGen | 2024 | Multi-agent conversation framework | | Agent-FLAN | ACL 2024 | Agent fine-tuning methodology |
import arxiv
from datetime import datetime, timedelta
def find_recent_agent_papers(days=14):
"""Find cutting-edge agent papers."""
queries = [
"ti:agent AND (ti:LLM OR ti:language model)",
"abs:autonomous agent AND abs:tool use AND abs:2024",
"ti:multi-agent AND abs:large language",
"abs:coding agent OR abs:software agent",
]
seen = set()
papers = []
for q in queries:
search = arxiv.Search(
query=q, max_results=15,
sort_by=arxiv.SortCriterion.SubmittedDate,
)
for r in search.results():
if r.entry_id not in seen:
seen.add(r.entry_id)
papers.append({
"title": r.title,
"date": r.published.strftime("%Y-%m-%d"),
"url": r.entry_id,
})
papers.sort(key=lambda x: x["date"], reverse=True)
for p in papers[:20]:
print(f"[{p['date']}] {p['title']}")
print(f" {p['url']}")
find_recent_agent_papers()
frameworks = {
"LangGraph": {
"paradigm": "Graph-based workflows",
"persistence": "Built-in checkpointing",
"multi_agent": "Yes",
"language": "Python/JS",
},
"CrewAI": {
"paradigm": "Role-based agents",
"persistence": "Memory module",
"multi_agent": "Yes (crew)",
"language": "Python",
},
"AutoGen": {
"paradigm": "Conversational agents",
"persistence": "Chat history",
"multi_agent": "Yes (group chat)",
"language": "Python/.NET",
},
"OpenHands": {
"paradigm": "Computer use agent",
"persistence": "Workspace state",
"multi_agent": "No",
"language": "Python",
},
}
for name, info in frameworks.items():
print(f"\n{name}:")
for k, v in info.items():
print(f" {k}: {v}")
documentation
Write Tsinghua University theses using the ThuThesis LaTeX template
development
Templates, formatting rules, and strategies for thesis and dissertation writing
documentation
Set up LaTeX templates for PhD and Master's thesis documents
documentation
Write SJTU theses using the SJTUThesis LaTeX template with full compliance