skills/data-ontologist/SKILL.md
--- name: data-ontologist description: Polyglot persistence: when to use relational, graph, or document databases; integration patterns. user-invocable: false effort: high paths: - "**/schema*" - "**/migrations/**" allowed-tools: - Read - Glob - Grep --- # Polyglot Persistence Architecture Architectural guidance for using multiple database technologies together, with emphasis on PostgreSQL/Supabase (relational), Neo4j (graph), and MongoDB (document). Demonstrates when to use each dat
npx skillsauth add jasonwarrenuk/goblin-mode skills/data-ontologistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Architectural guidance for using multiple database technologies together, with emphasis on PostgreSQL/Supabase (relational), Neo4j (graph), and MongoDB (document). Demonstrates when to use each database type and how to integrate them effectively.
Use this skill when:
Start with the graph. Optimise from there.
Most real-world domains are fundamentally about relationships. The graph is the truest representation of how entities connect. Start by thinking in nodes and edges — then decide where to persist based on access patterns and consistency needs.
Right database for right data concern
Don't force all data into one database type. Use:
Before choosing databases, model the domain as a graph:
// Step 1-3: Model the domain as a graph first
(:User)-[:MEMBER_OF {role: 'admin', since: date}]->(:Organisation)
(:User)-[:ENROLLED_IN {status: 'active'}]->(:Course)
(:Course)-[:REQUIRES]->(:Course)
(:User)-[:COMPLETED {score: 0.85}]->(:Module)
(:Module)-[:BELONGS_TO]->(:Course)
Then decide:
Transactional Data:
Structured Records:
Examples:
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Orders table
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total DECIMAL(10,2),
status TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
Highly Connected Data:
Reason: Joins become expensive, recursive queries complex.
Rapidly Evolving Schemas:
Reason: Migrations expensive, rigid structure.
Relationships as Primary Concern:
Variable-Depth Traversals:
Examples:
// Social connections
(:User)-[:FOLLOWS]->(:User)
(:User)-[:BLOCKED]->(:User)
// Learning paths
(:Course)-[:REQUIRES]->(:Course)
(:User)-[:COMPLETED]->(:Course)
// Organizational structure
(:Person)-[:REPORTS_TO]->(:Person)
(:Person)-[:MEMBER_OF]->(:Team)
Simple Lookups:
Reason: Relational databases excel at indexed lookups.
Large Aggregations:
Reason: SQL aggregations and window functions more powerful.
Semi-Structured Data:
Nested/Embedded Data:
Flexible Schemas:
Examples:
// Blog post with embedded comments
{
_id: ObjectId("..."),
title: "Getting Started with SvelteKit",
slug: "getting-started-sveltekit",
author: {
id: "user-123",
name: "Alice"
},
content: "...",
tags: ["svelte", "javascript", "tutorial"],
comments: [
{
id: "comment-1",
userId: "user-456",
text: "Great article!",
createdAt: ISODate("2024-01-15")
}
],
metadata: {
views: 1250,
readingTime: "5 min"
},
publishedAt: ISODate("2024-01-10")
}
// Product with varying attributes
{
_id: ObjectId("..."),
name: "Laptop",
category: "electronics",
price: 999.99,
specs: {
cpu: "Intel i7",
ram: "16GB",
storage: "512GB SSD",
screen: "15.6 inch"
}
}
// Different product type, different fields
{
_id: ObjectId("..."),
name: "T-Shirt",
category: "clothing",
price: 29.99,
sizes: ["S", "M", "L", "XL"],
colors: ["black", "white", "blue"],
material: "100% cotton"
}
Complex Transactions:
Reason: Relational databases better at multi-document transactions.
Relationship-Heavy Data:
Reason: Graph databases handle this natively.
Highly Normalized Data:
Reason: Relational databases enforce this better.
RELATIONSHIPS → Neo4j (Graph)
STRUCTURED ENTITIES → PostgreSQL (Relational)
DOCUMENTS/CONTENT → MongoDB (Document)
VERY STABLE → PostgreSQL
EVOLVING → MongoDB
SCHEMA-OPTIONAL → Neo4j
BY KEY/ID → PostgreSQL or MongoDB
BY TRAVERSAL → Neo4j
BY CONTENT/QUERY → MongoDB
ABSOLUTE → PostgreSQL
EVENTUAL OKAY → MongoDB or Neo4j
YES → MongoDB
NO → PostgreSQL
Supabase (PostgreSQL):
Neo4j:
MongoDB:
Why All Three?:
Supabase (PostgreSQL):
Neo4j:
MongoDB:
Why All Three?:
Supabase (PostgreSQL):
Neo4j:
MongoDB:
Why All Three?:
Supabase (PostgreSQL):
MongoDB:
Neo4j (Optional):
Why This Mix?:
Use same IDs across all databases:
const userId = generateId();
// Supabase - Auth and profile
await supabase.from('users').insert({
id: userId,
email,
name
});
// Neo4j - Social graph node
await neo4j.run(`
CREATE (u:User {id: $userId, name: $name})
`, { userId, name });
// MongoDB - User preferences
await mongo.collection('user_preferences').insertOne({
_id: userId,
theme: 'dark',
notifications: {
email: true,
push: false
}
});
Store references, fetch as needed:
// MongoDB - Blog post
{
_id: ObjectId("..."),
title: "My Post",
authorId: "user-123", // Reference to PostgreSQL user
content: "...",
tags: ["javascript", "svelte"]
}
// Query pattern
const post = await mongo.collection('posts').findOne({ _id });
const author = await supabase
.from('users')
.select('*')
.eq('id', post.authorId)
.single();
Embed when:
// Good: Embed comments in post
{
title: "My Post",
comments: [
{ text: "Great!", userId: "user-123" }
]
}
Reference when:
// Good: Reference author
{
title: "My Post",
authorId: "user-123" // Author data in PostgreSQL
}
Keep databases in sync via events:
// User created in Supabase
supabase.on('INSERT', 'users', async (payload) => {
const user = payload.record;
// Create in Neo4j
await createUserNode(user);
// Create preferences in MongoDB
await mongo.collection('user_preferences').insertOne({
_id: user.id,
theme: 'light',
notifications: {}
});
});
async function getUserDashboard(userId) {
// PostgreSQL - Account info
const account = await supabase
.from('users')
.select('*')
.eq('id', userId)
.single();
// Neo4j - Social metrics
const social = await neo4j.run(`
MATCH (u:User {id: $userId})
OPTIONAL MATCH (u)-[:FOLLOWS]->(following)
OPTIONAL MATCH (follower)-[:FOLLOWS]->(u)
RETURN
count(DISTINCT following) as followingCount,
count(DISTINCT follower) as followerCount
`, { userId });
// MongoDB - Recent content
const posts = await mongo
.collection('posts')
.find({ authorId: userId })
.sort({ createdAt: -1 })
.limit(5)
.toArray();
return {
account: account.data,
social: social.records[0],
recentPosts: posts
};
}
Normalized Structure:
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email TEXT UNIQUE NOT NULL,
name TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Clear foreign keys
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total DECIMAL(10,2)
);
Labels and Relationships:
// Node labels
(:User)
(:Organization)
(:Course)
// Typed relationships
(:User)-[:MEMBER_OF {role}]->(:Organization)
(:User)-[:FOLLOWS {since}]->(:User)
(:Course)-[:REQUIRES]->(:Course)
Flexible Structure:
// Embedded approach (1-to-few)
{
_id: ObjectId("..."),
userId: "user-123",
title: "My Blog Post",
content: "...",
comments: [ // Embedded
{
id: "comment-1",
userId: "user-456",
text: "Great post!",
createdAt: ISODate("2024-01-15")
}
],
tags: ["tutorial", "javascript"],
metadata: {
views: 150,
likes: 23
}
}
// Reference approach (1-to-many)
{
_id: ObjectId("..."),
title: "E-commerce Order",
userId: "user-123", // Reference
items: [
{ productId: "prod-789", quantity: 2 }, // Reference
{ productId: "prod-456", quantity: 1 }
],
total: 149.99
}
Indexes:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
Constraints and Indexes:
CREATE CONSTRAINT user_id_unique
FOR (u:User) REQUIRE u.id IS UNIQUE;
CREATE INDEX user_email
FOR (u:User) ON (u.email);
Indexes:
// Single field
db.posts.createIndex({ authorId: 1 });
// Compound index
db.posts.createIndex({ authorId: 1, createdAt: -1 });
// Text search
db.posts.createIndex({ title: "text", content: "text" });
// Embedded field
db.posts.createIndex({ "metadata.views": -1 });
Query Patterns:
// Efficient: Uses index
db.posts.find({ authorId: userId }).sort({ createdAt: -1 });
// Inefficient: Full collection scan
db.posts.find({ "comments.text": /keyword/ });
// Better: Index comment text separately or use aggregation
// ✗ Bad: Complex multi-document transaction in MongoDB
session.startTransaction();
await orders.insertOne({ userId, total });
await inventory.updateOne({ productId }, { $inc: { stock: -1 } });
await session.commitTransaction();
Why: PostgreSQL designed for this, ACID guarantees stronger.
// ✗ Bad: Embedding user data in every post
{
title: "My Post",
author: {
id: "user-123",
name: "Alice",
email: "[email protected]",
bio: "...",
avatar: "..." // Duplicated everywhere!
}
}
Better: Store author ID, fetch user data separately.
// ✗ Bad: Using Neo4j for key-value lookup
MATCH (u:User {email: $email})
RETURN u;
Why: PostgreSQL or MongoDB faster for indexed lookups.
// ✗ Bad: Normalized MongoDB (defeats the purpose)
// users collection
{ _id: "user-123", name: "Alice" }
// posts collection
{ _id: "post-456", authorId: "user-123", title: "..." }
// comments collection
{ _id: "comment-789", postId: "post-456", text: "..." }
Why: If normalizing this much, use PostgreSQL instead.
Begin with PostgreSQL:
Add MongoDB When:
Add Neo4j When:
PostgreSQL → MongoDB:
// Export from PG
const posts = await supabase.from('posts').select('*');
// Transform and insert to MongoDB
await mongo.collection('posts').insertMany(
posts.map(post => ({
_id: post.id,
...post,
metadata: {
views: post.view_count,
likes: post.like_count
}
}))
);
MongoDB → Neo4j (relationships):
// Get follows from MongoDB
const follows = await mongo.collection('follows').find().toArray();
// Create relationships in Neo4j
await neo4j.run(`
UNWIND $follows AS follow
MATCH (follower:User {id: follow.followerId})
MATCH (followed:User {id: follow.followedId})
CREATE (follower)-[:FOLLOWS {since: follow.createdAt}]->(followed)
`, { follows });
KSBs Demonstrated:
How to Document:
Architecture is successful when:
development
Writing style guide for Jason Warren. Use this skill whenever writing prose, reports, documentation, or any substantive text for Jason — including drafting sections, editing existing content, or rewriting passages. Also use when Jason asks you to review or improve writing. Trigger on any request involving writing, drafting, editing, or composing text that isn't purely code. This includes github Pull Requests & Linear tasks
testing
{{ 𝚫𝚫𝚫 }} Check all version number props and update them
tools
{{ 𝛀𝛀𝛀 }} Map out project status, direction, and next steps
development
{{ 𝛀𝛀𝛀 }} Review code changes on the current branch against its open PR