skills/upgrade-scripts/SKILL.md
How to add a database migration / upgrade script in a data-fair service that uses @data-fair/lib-node/upgrade-scripts (data-fair, processings, events, catalogs, etc.). Covers the gotcha that trips up most agents: which version goes in the folder name. Use this skill whenever the user asks to add an upgrade script, write a migration, backfill a field on existing documents, reshape a Mongo collection on deploy, or anything described as "needs to run once on production after deploy". Also use it when reading or modifying an existing `upgrade/X.Y.Z/` directory.
npx skillsauth add data-fair/lib upgrade-scriptsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
@data-fair/lib-node/upgrade-scripts is the migration runner used by data-fair
services. At service startup it scans the upgrade/ directory for
version-named subfolders, compares them against the version recorded in the
services Mongo collection, and runs the scripts whose folder version is >=
the recorded version. After running, it stores the current package.json
version back in the services collection.
Name the folder after the version of the last release of the service at the time you write the script. Never after an anticipated future version.
When you are working on a branch, you do not know what version the change will eventually ship as. The same branch's content can end up in a minor release, a major release, or be backported to several lines at once. The forward version is genuinely unknown at authoring time, so the only stable reference is the last version that has already been released.
The runner uses semver.gte(folder, dbVersion), so the folder name
effectively encodes the claim "this migration applies whenever the database
was previously at this version or older". The last released version is the
correct answer to that claim — it is the most recent state from which a
user's database could still be coming.
With folder = last-released-version:
package.json is at the
last release (= folder name) and the DB matches. When the new release
deploys, package.json bumps past the folder; the runner sees
dbVersion = <last-release>, folder >= dbVersion → runs once. After the
run, DB is updated to the new package.json. Subsequent restarts: folder
< dbVersion → never runs again.package.json still at the last release. So folder = pjson = DB, and the
script re-runs on every staging deploy until the next release ships.
This is expected and is exactly why scripts must be idempotent.Read it straight from the service's package.json. In data-fair services
the version is bumped on release, not at branch start, so on any working
branch the version field is exactly the last released version.
jq -r .version package.json
# → 6.4.2
# → create upgrade/6.4.2/your-script.ts
Take the value as it stands; do not invent or anticipate the next bump.
If several scripts target the same last-released version, put them all in
the same folder — they execute in lexicographic order. Prefix with 01-,
02- if order matters.
Scripts are TypeScript modules with a default export that satisfies the
UpgradeScript interface:
// upgrade/6.4.2/backfill-modified.ts
import type { UpgradeScript } from '@data-fair/lib-node/upgrade-scripts.js'
const upgradeScript: UpgradeScript = {
description: 'Backfill _modified field on existing datasets',
async exec (db, debug) {
debug('backfilling _modified on datasets without it')
let count = 0
const cursor = db.collection('datasets').find({ _modified: { $exists: false } })
for await (const dataset of cursor) {
const _modified = dataset.dataUpdatedAt ?? dataset.updatedAt
if (_modified) {
await db.collection('datasets').updateOne(
{ _id: dataset._id },
{ $set: { _modified } }
)
count++
}
}
debug(`backfilled ${count} datasets`)
}
}
export default upgradeScript
Key points:
description is logged at run time; make it a single short sentence in
the imperative or descriptive mood.exec(db, debug) receives a live Db connection from the same Mongo
client the service uses, and a namespaced debug logger
(upgrade:<folder>:<filename>).$exists: false, { field: { $ne: newValue } },
or similar guards. A script that has already done its work must be a no-op,
not a failure.Scripts must be safe to re-run. Re-runs happen on every staging deploy until the next release ships (see above), and also when two pods start concurrently, when one crashes mid-loop, or on manual re-runs.
Make the body trivially safe to re-run:
// ✓ Filter out already-migrated documents
await db.collection('x').updateMany(
{ newField: { $exists: false } },
{ $set: { newField: defaultValue } }
)
// ✓ Use $rename only if source still exists
await db.collection('x').updateMany(
{ oldName: { $exists: true } },
{ $rename: { oldName: 'newName' } }
)
// ✗ Anything that breaks on second run
await db.collection('x').updateMany({}, { $inc: { counter: 1 } })
For destructive migrations (dropping a field, deleting documents), pair the write with a precondition check so a re-run is a no-op.
The runner is normally called once at service startup, before the HTTP server accepts traffic, alongside the lock manager:
import upgradeScripts from '@data-fair/lib-node/upgrade-scripts.js'
import locks from '@data-fair/lib-node/locks.js'
import db from './db.js'
await locks.init(db)
await upgradeScripts(db, locks)
If your service uses workspaces, the runner reads name and version from
the parent package.json first (../package.json), falling back to the
current one. The name is the key under which the version is stored in the
services collection, so don't rename a service without a manual data
migration.
Pass isFresh so the runner can skip historical scripts on a brand-new
database:
await upgradeScripts(db, locks, './', async () => {
const count = await db.collection('datasets').estimatedDocumentCount()
return count === 0
})
When isFresh returns true, no scripts run; the runner just records the
current version. When false, all scripts with folder name init run, then
normal semver-gated scripts run as usual.
The runner uses the debug package:
DEBUG=upgrade,upgrade:* npm start
You will see:
execIf a script seems to not run, double-check:
semver.coerce is not used on folder
names — 1.0 will fail to compare; use 1.0.0).>= the DB-stored version (db.services.findOne({ id: '<service-name>' })).default export matches the UpgradeScript shape.version field of package.json on your working branch. Never an
anticipated future version.description is one short sentence.exec is idempotent (safe to run twice).01-, 02-, etc.development
Pre-PR flight check on the current branch. Reviews the diff against stated intent, flags scope creep, regression risks, and commit hygiene problems, and drafts a compact PR title (conventional-commit style) and description. Requires a clean working tree. Does not run tests, lint, or type-check.
tools
How to implement real-time websocket communication in data-fair services. Covers the full stack: server-side setup with @data-fair/lib-express/ws-server, emitting events with @data-fair/lib-node/ws-emitter, subscribing from Vue components with @data-fair/lib-vue/ws, and using @data-fair/lib-node/ws-client for Node.js programmatic WS clients and integration tests. Use this skill whenever the user wants to add websocket support, emit real-time events, subscribe to channels, implement live updates, push notifications, any pub/sub pattern in a data-fair service, or write integration tests that verify websocket behavior — even if they just say "real-time", "live updates", or "test websockets".
development
How to use the @data-fair/lib session management system in services that consume sessions (not login/account management). Use this skill whenever the task involves reading user identity, checking permissions, protecting routes, accessing account/organization info, or implementing authorization logic in a data-fair service -- both on the Express/Node backend and in Vue frontend components. Also use it when the user mentions session middleware, account roles, admin mode, or organization switching in a data-fair context.
testing
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.