skills/43-wentorai-research-plugins/skills/tools/knowledge-graph/ontology-design-guide/SKILL.md
Design ontologies and knowledge graphs for research data modeling
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research ontology-design-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A skill for designing ontologies and knowledge graphs to model research domain knowledge. Covers ontology engineering methodologies, OWL and RDF basics, reusing existing ontologies, and practical tools for building, validating, and querying knowledge graphs.
An ontology is a formal, explicit specification of a shared
conceptualization. In practical terms, it defines:
- Classes: Categories of things (e.g., Gene, Disease, Drug)
- Properties: Relationships between things (e.g., causes, treats)
- Individuals: Specific instances (e.g., TP53, Breast Cancer)
- Axioms: Rules and constraints (e.g., every Drug has exactly
one molecular formula)
Purpose in research:
- Standardize terminology across research groups
- Enable data integration from heterogeneous sources
- Support automated reasoning and inference
- Facilitate knowledge discovery through graph queries
- Provide machine-readable domain models
Taxonomy: Hierarchical classification (is-a relationships only)
Example: Animal > Mammal > Primate > Human
Ontology: Formal model with classes, properties, and axioms
Supports reasoning (e.g., if X treats Y and Y is-a Disease,
then X is a DrugCandidate)
Knowledge Graph: An ontology populated with instance data
Millions of triples: (subject, predicate, object)
Examples: Wikidata, DBpedia, Google Knowledge Graph
def ontology_design_process(domain: str) -> dict:
"""
Steps for designing a domain ontology.
Args:
domain: The research domain to model
"""
return {
"step_1_scope": {
"description": "Define scope and competency questions",
"questions": [
"What domain does the ontology cover?",
"What questions should the ontology be able to answer?",
"Who will use it and for what purpose?"
],
"example": (
"Domain: Drug-disease interactions. "
"Competency question: 'What drugs target proteins "
"associated with Alzheimer disease?'"
)
},
"step_2_reuse": {
"description": "Search for existing ontologies to reuse",
"resources": [
"BioPortal (bioportal.bioontology.org) -- biomedical ontologies",
"Linked Open Vocabularies (lov.linkeddata.es) -- general",
"OBO Foundry (obofoundry.org) -- life sciences",
"Schema.org -- web-scale vocabulary"
]
},
"step_3_enumerate": {
"description": "List key terms, concepts, and relationships",
"method": "Brainstorm with domain experts; review literature"
},
"step_4_model": {
"description": "Define class hierarchy and properties",
"tools": ["Protege", "WebVOWL", "TopBraid Composer"]
},
"step_5_formalize": {
"description": "Encode in OWL/RDF with axioms and constraints"
},
"step_6_validate": {
"description": "Test against competency questions and real data",
"methods": ["SPARQL queries", "Reasoner (HermiT, Pellet)", "Unit tests"]
},
"step_7_publish": {
"description": "Publish with persistent URI and documentation",
"best_practice": "Use w3id.org or purl.org for persistent identifiers"
}
}
RDF (Resource Description Framework) represents knowledge as triples:
(Subject, Predicate, Object)
Examples:
(:Aspirin, :treats, :Headache)
(:TP53, rdf:type, :Gene)
(:TP53, :associatedWith, :BreastCancer)
(:Aspirin, :hasChemicalFormula, "C9H8O4")
Serialization formats:
- Turtle (.ttl): Human-readable, most common for authoring
- JSON-LD (.jsonld): Web-friendly, API-compatible
- RDF/XML (.rdf): Verbose, legacy format
- N-Triples (.nt): Simple, good for large datasets
@prefix : <http://example.org/research#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
# Classes
:Gene a owl:Class ;
rdfs:label "Gene" ;
rdfs:comment "A unit of heredity in a living organism." .
:Disease a owl:Class ;
rdfs:label "Disease" .
:Drug a owl:Class ;
rdfs:label "Drug" .
# Properties
:associatedWith a owl:ObjectProperty ;
rdfs:domain :Gene ;
rdfs:range :Disease .
:treats a owl:ObjectProperty ;
rdfs:domain :Drug ;
rdfs:range :Disease .
# Individuals
:TP53 a :Gene ;
rdfs:label "TP53" ;
:associatedWith :BreastCancer .
:BreastCancer a :Disease ;
rdfs:label "Breast Cancer" .
# Find all genes associated with Breast Cancer
SELECT ?gene ?geneLabel
WHERE {
?gene a :Gene .
?gene :associatedWith :BreastCancer .
?gene rdfs:label ?geneLabel .
}
# Find drugs that treat diseases associated with gene TP53
SELECT ?drug ?disease
WHERE {
:TP53 :associatedWith ?disease .
?drug :treats ?disease .
}
# Count diseases per gene
SELECT ?gene (COUNT(?disease) AS ?diseaseCount)
WHERE {
?gene a :Gene .
?gene :associatedWith ?disease .
}
GROUP BY ?gene
ORDER BY DESC(?diseaseCount)
| Tool | Type | Best For | |------|------|---------| | Protege | Desktop IDE | Full ontology development and reasoning | | WebVOWL | Web viewer | Visualizing ontology structure | | RDFLib (Python) | Library | Programmatic RDF manipulation | | Apache Jena | Framework | SPARQL endpoint and reasoning | | Neo4j | Graph database | Property graph modeling (not RDF) | | Blazegraph/GraphDB | Triplestore | Storing and querying RDF data |
Follow the FAIR principles (Findable, Accessible, Interoperable, Reusable) when publishing ontologies. Reuse existing terms from established ontologies before creating new ones. Document every class and property with labels, definitions, and examples. Use a reasoner to check logical consistency. Version your ontology and maintain a changelog. Publish both human-readable documentation (HTML) and machine-readable files (OWL/TTL) at a persistent URI.
tools
Show mcp-stata identity, connected tools, and status. Use when the user asks if mcp-stata is available, asks about access to the toolkit, or asks what Stata tools are connected.
tools
Activate when users mention Stata commands, .do files, regressions, econometrics, stored results, graphs, dataset inspection, replication, or Stata errors. Route the task through mcp-stata tools and the specialized research skills instead of treating it as plain text coding.
development
Build and review paper-ready regression, balance, and summary tables from Stata outputs. Use when the user needs a clean table for a draft, appendix, or coauthor share-out.
tools
Install, configure, update, or verify mcp-stata across Claude Code, Codex, Gemini CLI, Cursor, Windsurf, and VS Code. Activate when users ask to set up the Stata toolkit or troubleshoot the installation.