skills/echo-open-research-platform/SKILL.md
Build and configure ECHO-style research platforms for running reproducible user studies comparing chat-based AI and web search interactions. Use when: 'set up a user study platform', 'build a chat vs search experiment', 'log participant interactions with LLMs', 'create a research workflow with surveys and tasks', 'export user study interaction traces', 'configure an IRB-compliant experiment with pre/post questionnaires'.
npx skillsauth add ndpvt-web/arxiv-claude-skills echo-open-research-platformInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to build, configure, and extend research platforms following the ECHO architecture — a serverless, three-layer system for running reproducible mixed-method user studies that compare human interaction with conversational AI (LLMs) and web search engines. ECHO's design integrates consent flows, configurable surveys, chat and search task sessions with fine-grained interaction logging, writing/judgment tasks, and structured data export — all orchestrated through an admin dashboard that requires minimal coding.
ECHO uses a serverless three-layer architecture: (1) a React + TailwindCSS frontend split into an administrator dashboard and a participant interface, (2) a Firebase backend using Firestore for real-time data storage and Firebase Authentication for user management, and (3) an external API layer that integrates LLM providers (OpenAI, Anthropic Claude, Google Gemini) and search APIs (Brave Search by default). This architecture eliminates the need for dedicated servers or clusters — the entire platform runs with Firebase credentials and API keys.
The core innovation is a configurable six-stage study workflow that researchers assemble through the admin dashboard: (1) background survey, (2) pre-task questionnaire, (3) main task (chat or search session), (4) post-task questionnaire, (5) experience survey, (6) end-of-study survey. Administrators can enable, disable, or reorder these stages via drag-and-drop. Survey instruments support four editing modes — View, Reorder, Form (interactive GUI), and JSON (direct import/export) — letting researchers range from no-code to full-config approaches.
What distinguishes ECHO from ad-hoc study setups is its unified interaction trace logging. For chat sessions, it captures unique turn identifiers, full text content, typing start/end timestamps, and per-turn user ratings. For search sessions, it captures query text, typing timestamps, result list counts, clicked URLs with timestamps, and result rankings. An in-situ survey system triggers dynamic popup questionnaires based on configurable conditions (after N prompts, after N responses, at intervals, or before task submission), enabling measurement without disrupting the participant's flow.
Scaffold the three-layer project structure. Create a monorepo with two React apps (admin-dashboard/ and participant-app/), a shared Firebase configuration module, and an API integration layer. Use create-react-app or Vite with TailwindCSS. Initialize Firebase with Firestore and Authentication.
Configure Firebase backend and authentication. Set up a Firestore database with collections for studies, participants, sessions, interactions, and responses. Enable Firebase Authentication for participant and administrator login. Store all study configurations as Firestore documents.
Build the administrator dashboard with study workflow configuration. Implement a drag-and-drop interface for the six default study stages. Each stage should be toggleable (enabled/disabled) and reorderable. Store the stage configuration as a JSON document in Firestore under the study record:
{
"studyId": "study_001",
"stages": [
{"id": "consent", "enabled": true, "order": 0},
{"id": "background_survey", "enabled": true, "order": 1},
{"id": "pre_task", "enabled": true, "order": 2},
{"id": "main_task", "enabled": true, "order": 3, "type": "chat"},
{"id": "post_task", "enabled": true, "order": 4},
{"id": "experience_survey", "enabled": true, "order": 5},
{"id": "end_survey", "enabled": true, "order": 6}
],
"llmProvider": "openai",
"searchProvider": "brave",
"inSituTriggers": {"afterPrompts": 3, "beforeSubmission": true}
}
Implement the survey configuration system with four editing modes. Build a survey editor component supporting: (a) View mode for read-only preview, (b) Reorder mode for drag-and-drop question ordering, (c) Form mode with a GUI for adding Likert scales, multiple choice, and open-ended questions, and (d) JSON mode for direct import/export of survey instrument definitions.
Build the participant interface with consent flow. Create an IRB-compliant consent page with customizable text, acknowledgment checkboxes, and a demographic questionnaire. Route participants through the configured study stages sequentially, storing progress in Firestore so sessions can be resumed.
Implement the chat task interface. Build a three-panel layout: task description on the left, central conversation area with markdown rendering and code highlighting, and an optional note-taking sidebar on the right. Integrate LLM APIs (OpenAI, Claude, Gemini) with streaming responses. Add an in-situ rating mechanism for each chat turn (e.g., thumbs up/down or Likert scale).
Implement the search task interface. Mirror the chat layout but replace the conversation area with a search box and SERP display showing titles, URLs, and snippet abstracts. Integrate a search API (Brave Search by default). Log clicked results with their ranking position and timestamps.
Build the interaction trace logger. Create a logging service that writes to Firestore in real time. For chat: log turn ID, full text, typing start/end timestamps, and user ratings. For search: log query text, typing timestamps, result count, clicked URLs, click timestamps, and result rankings. Attach participant and session IDs to every logged event.
Implement in-situ popup surveys. Build a survey overlay component triggered by configurable conditions: after N prompts submitted, after N responses received, after N search queries, at timed intervals, or before task submission. Store trigger configuration per study and log all popup responses with timestamps.
Build the data export pipeline. Add a "View Responses" section to the admin dashboard that queries Firestore and exports structured CSV files for: participant demographics, pre/post-task responses, complete chat histories with timestamps, search queries with click data, in-situ survey responses, and participant notes.
Example 1: Scaffolding a new ECHO-style research platform
User: "I need to build a platform for running user studies that compare how people find information using ChatGPT vs Google Search. It needs consent forms, surveys, and interaction logging."
Approach:
apps/admin and apps/participant directoriesstudies, participants, sessions, chatInteractions, searchInteractions, surveyResponsesOutput structure:
echo-platform/
apps/
admin/ # React admin dashboard
src/
components/
StudyConfigurator.jsx
SurveyEditor.jsx
ResponseExporter.jsx
participant/ # React participant interface
src/
components/
ConsentFlow.jsx
ChatTask.jsx
SearchTask.jsx
SurveyStep.jsx
InSituPopup.jsx
shared/
firebase.js # Firestore + Auth config
logger.js # Interaction trace logger
apiClients.js # LLM and Search API wrappers
firestore.rules
package.json
Example 2: Adding fine-grained interaction logging to an existing chat interface
User: "I already have a chat interface that talks to Claude. I need to add ECHO-style interaction logging that captures typing timestamps, turn ratings, and exports to CSV."
Approach:
typingStart (onFocus or first keydown) and typingEnd (on submit) timestampsLogger implementation pattern:
// logger.js
import { collection, addDoc, serverTimestamp } from 'firebase/firestore';
export async function logChatInteraction(db, sessionId, data) {
await addDoc(collection(db, 'chatInteractions'), {
sessionId,
turnId: data.turnId,
role: data.role, // 'user' or 'assistant'
content: data.content,
typingStartedAt: data.typingStartedAt,
typingEndedAt: data.typingEndedAt,
userRating: data.userRating || null,
createdAt: serverTimestamp(),
});
}
export async function logSearchInteraction(db, sessionId, data) {
await addDoc(collection(db, 'searchInteractions'), {
sessionId,
queryText: data.queryText,
typingStartedAt: data.typingStartedAt,
typingEndedAt: data.typingEndedAt,
resultCount: data.resultCount,
clickedUrl: data.clickedUrl || null,
clickedRank: data.clickedRank || null,
clickedAt: data.clickedAt || null,
createdAt: serverTimestamp(),
});
}
Example 3: Configuring in-situ popup surveys for a within-subjects study
User: "I'm running a study where each participant does both a chat task and a search task. I need popup surveys that appear after every 3 prompts during chat and after every 2 queries during search, plus a final one before they submit."
Approach:
Configuration:
{
"conditions": {
"chat": {
"inSituTriggers": [
{"type": "afterPrompts", "every": 3, "surveyId": "mid_task_chat"},
{"type": "beforeSubmission", "surveyId": "final_reflection"}
]
},
"search": {
"inSituTriggers": [
{"type": "afterQueries", "every": 2, "surveyId": "mid_task_search"},
{"type": "beforeSubmission", "surveyId": "final_reflection"}
]
}
}
}
serverTimestamp() rather than client-side timestamps to avoid clock skew across participants.Liu, J., Dinesh, N., & Yu, R. (2026). ECHO: An Open Research Platform for Evaluation of Chat, Human Behavior, and Outcomes. SIGIR '26. arXiv:2602.10295 — Read for the full architecture diagram, Firestore collection schemas, admin dashboard screenshots, and detailed descriptions of the six-stage configurable study workflow with in-situ survey trigger conditions.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".