skills/machine_learning/mace-dataset-curation/SKILL.md
Use this skill for turning VASP result trees into extxyz training datasets that follow the validated reference-script conventions for REF labels, optional head/config_type tags, and fixed split artifacts before MACE training.
npx skillsauth add q734738781/CatMaster mace-dataset-curationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill to convert collected VASP runs into a reusable extxyz dataset directory while staying close to the validated reference_scripts/mace_training_example export contract.
build_dataset_from_runs at one result root.frame_mode deliberately: final or all_ionic_steps.head_label explicitly, typically omat_pbe.split_unit="source_run" unless you intentionally want frame-level leakage.require_converged=false unless you are intentionally building a guessed-converged subset.build_dataset_from_runsall_ionic_steps is the default starting point when you want to match that path.final is only for deliberately reduced datasets where relaxed endpoints alone are the training target.dataset.extxyz, train.extxyz, valid.extxyz, and test.extxyz as the canonical handoff set.head_label / config_type tags.REF_energy, REF_forces, REF_stress, config_type, and optionally head.mace-mh-1 with omat_pbe, set head_label="omat_pbe" instead of assuming the head will be inferred later.mace-finetuning-and-benchmark or active-learning-relabel-loop instead of mixing curation and training into one opaque step.step_electronic_converged_guess for later filtering; use require_converged=true only when that hard subset is the actual dataset target.alignment_check=true unless you are deliberately debugging malformed XML; XML/ASE step-order mismatches should cause the run to be skipped, not silently truncated into the dataset.head_label when the dataset is intended for multi-head foundation-model finetuning.split_unit="source_run" for trajectory-style data unless you intentionally want frame-level mixing across train/valid/test.Return:
testing
Draft, audit, or revise point-by-point reviewer response letters for Nature-family manuscript revisions. Use when the user provides reviewer comments, editor decision letters, revision notes, response drafts, or asks how to respond to major/minor revision requests, rebuttal letters, response to reviewers, peer-review reports, 审稿意见回复, 逐点回复, 修回信, 大修回复, 小修回复, or 如何回复 reviewer.
development
Build full-text bilingual, figure-aware, source-grounded Markdown reading files for journal or conference papers from PDF, DOI, arXiv, publisher HTML, or pasted text. Use whenever the user asks to translate an entire paper, make a complete markdown reader, preserve figure or table placement near the relevant prose, or keep exact source anchors for every block. Do not use this for summaries, bullet-keyword notes, or citation-only tasks.
testing
Polish, restructure, or translate academic prose into Nature-leaning English using the paper-architecture and writing-strategy principles from Scientific English Writing & Communication, with phrase-level support from Academic Phrasebank. Use whenever the user asks to polish a manuscript paragraph, abstract, introduction, results, discussion, conclusion, title, methods section, or Chinese academic draft for publication-quality English.
tools
Build a complete but efficient Nature-style Chinese PPTX presentation from a scientific paper, preprint, PDF, article text, abstract, figure legends, or reading notes. Use this skill whenever the user asks to make slides/PPT/PPTX for journal club, group meeting, paper sharing, thesis seminar, lab meeting, department report, or academic presentation from a research paper, not only medical papers. It identifies the paper type and argument, selects only the figures needed for the story, writes Chinese slide content and speaker notes, creates the actual .pptx deck, and performs lightweight verification with cross-platform Python tooling by default.