skills/research/funding/open-science-guide/SKILL.md
Pre-registration, open data, and FAIR principles for research
npx skillsauth add wentorai/research-plugins open-science-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implement open science practices including study pre-registration, open data sharing, registered reports, and FAIR data principles to increase research transparency and reproducibility.
Open science practices address the replication crisis and increase trust in research findings:
| Practice | Problem It Addresses | |----------|---------------------| | Pre-registration | Prevents HARKing (hypothesizing after results are known) and p-hacking | | Open data | Enables verification, reanalysis, and meta-analyses | | Open materials | Allows exact replication of studies | | Open access | Removes paywalls that limit access to knowledge | | Registered reports | Eliminates publication bias (acceptance before results are known) | | Open code | Enables computational reproducibility |
Pre-registration commits you to your research plan before seeing the data:
Pre-registration template (standard fields):
1. HYPOTHESES
- H1: [Specific, directional hypothesis]
- H2: [Another hypothesis]
2. DESIGN
- Study type: [Experiment / Survey / Observational]
- Between/within subjects design: [Details]
- Conditions: [List experimental conditions]
3. SAMPLING PLAN
- Sample size: [N = X, justified by power analysis]
- Stopping rule: [When will data collection stop?]
- Inclusion/exclusion criteria: [List]
4. VARIABLES
- Independent variables: [List with levels]
- Dependent variables: [List with measurement details]
- Covariates: [List any control variables]
5. ANALYSIS PLAN
- Primary analysis: [Exact statistical test, e.g., "2x3 mixed ANOVA"]
- Secondary analyses: [Additional planned analyses]
- Inference criteria: [alpha level, correction for multiple comparisons]
- Exclusion criteria: [How will outliers or failed attention checks be handled?]
- Missing data: [How will missing data be handled?]
6. OTHER
- Exploratory analyses: [Analyses not tied to specific hypotheses]
| Platform | URL | Disciplines | Features | |----------|-----|-------------|----------| | OSF Registries | osf.io/registries | All | Free, flexible templates, versioned | | AsPredicted | aspredicted.org | Social sciences, psychology | Simple 9-question form, private until shared | | ClinicalTrials.gov | clinicaltrials.gov | Clinical research | Required for clinical trials (FDA) | | PROSPERO | crd.york.ac.uk/prospero | Systematic reviews | Health-related reviews only | | AEA RCT Registry | socialscienceregistry.org | Economics | RCTs in social sciences |
1. Design your study
2. Write the pre-registration document
3. Have a colleague review it
4. Submit to a registration platform
5. Receive a time-stamped registration (URL + DOI)
6. Collect and analyze data following the pre-registered plan
7. Report results transparently:
- Confirmatory analyses (pre-registered)
- Exploratory analyses (clearly labeled as exploratory)
8. Link the pre-registration in your manuscript
Registered Reports are a publication format where peer review occurs before data collection:
Stage 1 (Before Data Collection):
- Submit introduction, methods, and analysis plan
- Peer review evaluates the research question and methodology
- If accepted: "In-Principle Acceptance" (IPA)
- Paper will be published regardless of results
Stage 2 (After Data Collection):
- Collect data following the approved protocol
- Analyze and report results
- Add discussion section
- Final peer review checks adherence to protocol
- Publication
Over 300 journals now accept Registered Reports. Check the registry at cos.io/rr.
FAIR principles ensure research data is Findable, Accessible, Interoperable, and Reusable:
- F1: Data are assigned a globally unique, persistent identifier (DOI)
- F2: Data are described with rich metadata
- F3: Metadata include the identifier of the data
- F4: Data are registered or indexed in a searchable resource
Actions:
- Deposit data in a repository that assigns DOIs
- Write a comprehensive README and data dictionary
- Use standard metadata schemas (Dublin Core, DataCite)
- A1: Data are retrievable by their identifier using open protocols (HTTP)
- A2: Metadata remain accessible even if data are no longer available
Actions:
- Use established repositories (not personal websites)
- Specify access conditions clearly (open, restricted, embargoed)
- Even if data cannot be shared, publish metadata describing them
- I1: Data use a formal, accessible, shared language (e.g., CSV, JSON, RDF)
- I2: Data use vocabularies that follow FAIR principles
- I3: Data include qualified references to other data
Actions:
- Use standard file formats (CSV, not proprietary Excel)
- Use standard variable names and coding schemes
- Link to related datasets using DOIs
- R1: Data are richly described with provenance information
- R2: Data are released with a clear, accessible data usage license
- R3: Data meet domain-relevant community standards
Actions:
- Include a data dictionary with variable descriptions
- Apply a license (CC-BY 4.0 recommended)
- Describe data collection procedures, cleaning steps, and known issues
- Include analysis code alongside data
| Repository | Disciplines | Max Size | DOI | Cost | |-----------|-------------|----------|-----|------| | Zenodo | All | 50 GB | Yes | Free | | Dryad | All (focus on sciences) | Unlimited | Yes | Sliding scale | | Figshare | All | 20 GB (free) | Yes | Free/institutional | | OSF | All | 5 GB (free) | Yes | Free | | Harvard Dataverse | All (focus on social science) | 2.5 GB per file | Yes | Free | | ICPSR | Social science | Varies | Yes | Free deposit | | GenBank | Genomics | N/A | Accession numbers | Free | | Protein Data Bank | Structural biology | N/A | PDB IDs | Free |
# Dataset: [Title]
## Description
Brief description of the dataset and the study it comes from.
## Citation
If you use this data, please cite:
[Full citation of the associated publication]
## File Description
- `data_raw.csv` - Raw data as collected (N = 500, 45 variables)
- `data_processed.csv` - Cleaned data after exclusions (N = 467, 38 variables)
- `codebook.csv` - Variable descriptions, types, and valid ranges
- `analysis_script.R` - Complete analysis code reproducing all results
## Variables (data_processed.csv)
| Variable | Type | Description | Valid Range |
|----------|------|-------------|-------------|
| participant_id | string | Unique participant identifier | P001-P500 |
| age | integer | Age in years | 18-65 |
| condition | categorical | Experimental condition | control, treatment_a, treatment_b |
| score_pre | numeric | Pre-test score | 0-100 |
| score_post | numeric | Post-test score | 0-100 |
## Missing Data
- 33 participants excluded for failing attention checks
- 12 missing values in `score_post` (participants did not complete)
- Missing coded as NA
## License
CC-BY 4.0 International
## Contact
[Name, email, ORCID]
When data cannot be fully shared (e.g., due to participant privacy):
Many journals award badges for open science practices:
| Badge | Meaning | |-------|---------| | Open Data | Data publicly available | | Open Materials | Research materials publicly available | | Preregistered | Study pre-registered before data collection | | Preregistered + Analysis Plan | Preregistered with detailed analysis plan |
These badges (developed by COS) appear on published articles and signal commitment to transparency.
tools
10 document processing skills. Trigger: extracting text from PDFs, parsing references, document Q&A. Design: parsing pipelines (GROBID, marker) and structured extraction tools.
documentation
Guide to tldraw for infinite canvas whiteboarding and diagram creation
testing
Create graphical abstracts, schematic diagrams, and scientific illustrations
documentation
Create UML diagrams and architecture visualizations with PlantUML