skills/arkeval-benchmarking-evaluating-automated/SKILL.md
Automated ArkTS code repair using retrieval-augmented generation, LLM-based test oracle synthesis, and structured benchmark evaluation for HarmonyOS development. Use when: 'fix this ArkTS error', 'repair HarmonyOS code', 'convert TypeScript to ArkTS', 'ArkTS compilation error', 'debug HarmonyOS component', 'generate tests for ArkTS code'.
npx skillsauth add ndpvt-web/arxiv-claude-skills arkeval-benchmarking-evaluating-automatedInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to diagnose and repair ArkTS code using the retrieval-augmented repair (RAR) workflow from the ArkEval framework. ArkTS is a statically typed extension of TypeScript used for HarmonyOS development that rejects many valid TypeScript patterns at compile time. The ArkEval approach combines semantic fault localization, documentation-augmented patch generation, and LLM-consensus test oracle synthesis to systematically repair code in this low-resource language domain where conventional tools fall short.
any types, structural typing)aboutToAppear/aboutToDisappear hooksRetrieval-Augmented Repair (RAR) Pipeline. ArkEval's core workflow operates in three stages: (1) a Function Locator performs semantic fault localization to identify the buggy file and function, (2) a Patch Generator produces candidate fixes augmented by relevant HarmonyOS documentation and sample code retrieved via semantic search, and (3) a Patch Executor applies and verifies the fix against test oracles. The retrieval knowledge base is built from 15,000+ official HarmonyOS documentation pages and 400+ sample applications, chunked using AST-aware splitting (tree-sitter parsing at class/function boundaries) and a 512-token sliding window for prose, then embedded with a code-aware model and stored in a vector database for sub-millisecond retrieval.
ArkTS "False Friends." The central challenge is that ArkTS looks like TypeScript but enforces strict compile-time constraints that TypeScript allows at runtime. Common traps include: runtime property addition on objects (must use pre-declared class fields), any-typed variables (must use explicit types), JSON.parse() without class casting, dynamic property access via bracket notation, and event binding via .on('click') instead of ArkTS's .onClick(() => {...}) pattern. These "false friends" cause LLMs trained on TypeScript to generate plausible but invalid ArkTS code. The repair workflow must retrieve ArkTS-specific documentation to override TypeScript habits.
LLM-Consensus Test Oracle Synthesis. When test suites are absent, ArkEval generates test oracles using a three-model committee: one model generates tests, two others independently score each test on syntax correctness, logic plausibility, and API validity (0-10 scale). A test is accepted only if the standard deviation of scores is below 1.5, ensuring strong inter-model agreement. Accepted tests must then pass dual verification: fail on the buggy code and pass on the fixed code.
Classify the defect type. Examine the error and categorize it as one of three ArkTS-specific categories: (a) strict compile-time violation (35% of real bugs)--valid TypeScript rejected by ArkTS AOT compiler, (b) UI state desynchronization (42%)--imperative logic failing to trigger declarative UI updates, or (c) component lifecycle mismanagement (23%)--incorrect initialization/disposal in lifecycle hooks.
Perform semantic fault localization. Identify the specific file, class, and function containing the bug. For compile-time violations, trace the exact line from the compiler error. For UI state bugs, trace the data flow from the state variable through @State/@Link/@Prop decorators to the component that fails to re-render. For lifecycle bugs, check aboutToAppear and aboutToDisappear hook ordering.
Retrieve relevant ArkTS documentation and examples. Search for official HarmonyOS API documentation and sample code that covers the specific component, decorator, or API involved. Focus on ArkTS-specific patterns that differ from standard TypeScript. Prioritize official samples over generic TypeScript advice.
Identify the TypeScript-to-ArkTS "false friend" pattern. Check if the buggy code uses a TypeScript idiom that ArkTS rejects. Common false friends:
let x: any = {...} --> must declare a typed classobj.newProp = value --> must pre-declare all properties in class definitionJSON.parse(str) --> must cast to explicit class typecomponent.on('event') --> must use .onClick(() => {...}) style.push() without triggering reactivity --> use ObservedArray or replace referenceGenerate the patch using documentation-augmented context. Write the corrected code incorporating the retrieved ArkTS patterns. Ensure the patch:
anyVerify compilation. Mentally or actually compile the patched code against ArkTS constraints. Check that no dynamic property access, undeclared fields, or implicit any types remain.
Generate a test oracle if no tests exist. Write at least one test that exercises the repaired behavior. Apply the dual-verification principle: the test should fail on the original buggy code and pass on the repaired code. If uncertain about test quality, generate multiple candidate tests and keep only those where the expected behavior is unambiguous.
Validate reactivity for UI bugs. If the bug involved state desynchronization, verify that the fix causes a reference change (not just mutation) for @State arrays/objects, or uses @Observed/@ObjectLink decorators correctly, so the declarative UI layer detects the change.
Document the repair rationale. Explain which ArkTS constraint was violated and why the TypeScript pattern was invalid, so the user understands the root cause and avoids the pattern in future code.
Example 1: Strict compile-time violation -- dynamic property addition
User: "I'm getting a compilation error on this ArkTS code: let user: any = { name: 'Alice' }; user.age = 30;"
Approach:
any type and runtime property addition.any type annotation and dynamic .age assignment.any and dynamic properties; ArkTS does not.Output:
// BEFORE (invalid ArkTS -- TypeScript "false friend")
let user: any = { name: "Alice" };
user.age = 30;
// AFTER (valid ArkTS)
class User {
name: string = "";
age: number = 0;
}
let user = new User();
user.name = "Alice";
user.age = 30;
Explanation: ArkTS's AOT compiler requires statically known types. Replace any-typed object literals with class instances where all properties are pre-declared.
Example 2: UI state desynchronization -- array mutation not triggering re-render
User: "My ArkTS list component doesn't update when I push items to the array."
Approach:
.push() mutates in-place without changing the reference.@State array and the method that pushes to it.Output:
// BEFORE (UI does not re-render)
@Entry
@Component
struct TodoList {
@State items: string[] = [];
addItem(item: string) {
this.items.push(item); // Mutation -- UI won't detect change
}
}
// AFTER (UI re-renders correctly)
@Entry
@Component
struct TodoList {
@State items: string[] = [];
addItem(item: string) {
this.items = [...this.items, item]; // New reference -- triggers UI update
}
}
Explanation: ArkTS's declarative UI framework detects state changes by reference comparison. In-place mutations like .push() don't create a new reference. Spread into a new array to trigger reactivity.
Example 3: Component lifecycle mismanagement -- event listener leak
User: "My HarmonyOS app crashes after navigating away from a page that listens to sensor events."
Approach:
aboutToAppear for listener registration, check aboutToDisappear for missing cleanup.Output:
// BEFORE (listener leak -- crashes on navigation)
@Entry
@Component
struct SensorPage {
aboutToAppear() {
sensor.on(sensor.SensorId.ACCELEROMETER, (data) => {
// handle sensor data
});
}
// Missing: aboutToDisappear with sensor.off()
}
// AFTER (properly managed lifecycle)
@Entry
@Component
struct SensorPage {
private sensorCallback = (data: sensor.AccelerometerResponse) => {
// handle sensor data
};
aboutToAppear() {
sensor.on(sensor.SensorId.ACCELEROMETER, this.sensorCallback);
}
aboutToDisappear() {
sensor.off(sensor.SensorId.ACCELEROMETER, this.sensorCallback);
}
}
Explanation: ArkTS components must clean up external subscriptions in aboutToDisappear. Store the callback as a class field so the same reference can be used for both .on() and .off().
@State-decorated arrays or objects in place. Always create a new reference to trigger the declarative UI update cycle.any types, dynamic property access, or unsupported TypeScript syntax (e.g., optional chaining on untyped values). Retrieve documentation for the specific API being used.any, no dynamic properties, reference-based reactivity, symmetric lifecycle hooks.Paper: ArkEval: Benchmarking and Evaluating Automated Code Repair for ArkTS (Xie et al., 2026). Key sections: Section 3 for the five-phase benchmark construction pipeline, Section 4 for the RAR workflow architecture and retrieval knowledge base design, and Section 5 for the evaluation results showing the three ArkTS defect categories and per-model repair rates.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".