skills/high-dim-intuition-rebuild/SKILL.md
Diagnoses where a learner's 3D geometric intuition is misleading them in a high-dimensional context (concentration of measure, Gaussian shells, distance-metric breakdown, manifold hypothesis, volume-in-corners, random-projection preservation), then surgically replaces the false picture with the correct one. Use when the user is reasoning about high-dim spaces (embeddings, latent vectors, neural net activations, large-scale data clouds, optimization landscapes) and either makes a claim that's true in 3D but false in 1000D, or expresses confusion at a high-dim phenomenon that "shouldn't" happen.
npx skillsauth add lyndonkl/claude high-dim-intuition-rebuildInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
3D intuition is the user's most powerful asset for low-dim geometry — and their most dangerous liability for high-dim geometry. In 1000 dimensions, the unit ball is mostly empty, samples from any rotation-invariant distribution live on a thin shell, "nearest neighbor" stops being meaningful, and most of a hypercube's volume is in its corners. None of these are intuitive from 3D, and learners who don't know to expect them will reason wrongly about embeddings, latent spaces, and data clouds.
This skill is the surgical version of that repair: name the false intuition, demonstrate where it breaks, install the correct picture.
Quick example (concentration of measure):
Claim from learner: "If I sample from a high-dimensional Gaussian, I should expect samples near the origin — that's where the density peaks."
The false 3D intuition: In 2D or 3D, samples do cluster near the origin. The density peak there gives a strong gravitational pull on samples.
Why it breaks in high D: Density is per unit volume. The amount of volume at radius r grows as r^(d−1). In high d, even though density falls off near r = √d (the "shell"), the volume available there grows fast enough that density × volume is maximized on the shell, not at the origin.
Correct picture: In d dimensions, almost all the mass of a standard Gaussian is on a thin shell at radius √d. The 3D mental image of "Gaussian = blob centered at the origin" is wrong in high D; the right image is "Gaussian = thin spherical shell of radius √d, very little mass anywhere else".
Verify: numpy:
np.linalg.norm(np.random.randn(10000, 1000), axis=1).mean() ≈ 31.6 ≈ √1000. The samples really do all live at that radius.
Copy this checklist and track your progress:
Rebuild Progress:
- [ ] Step 1: Identify the misleading 3D intuition the learner is using
- [ ] Step 2: Name the high-dim phenomenon that contradicts it
- [ ] Step 3: Show *why* it breaks (the mechanism, not just the fact)
- [ ] Step 4: Install the correct high-dim picture
- [ ] Step 5: Verify with a numpy / sympy demonstration when possible
- [ ] Step 6: Generalize — what other 3D intuitions does this break?
Step 1: Identify the misleading 3D intuition the learner is using
Most high-dim confusions trace to one of five universal 3D intuitions (see The Five Big Lies of 3D Intuition below). Diagnose which:
Sometimes the learner doesn't state the intuition; they just express confusion at a phenomenon ("why is cosine similarity always ~0?"). Reverse-engineer to find which intuition would have predicted the wrong answer.
Step 2: Name the high-dim phenomenon that contradicts it
Each false intuition has a corresponding true high-dim phenomenon. Name it explicitly:
Naming matters. The learner who can name the phenomenon can look it up later and recognize it in new contexts.
Step 3: Show why it breaks (the mechanism, not just the fact)
This is the load-bearing step. Don't just assert that the 3D intuition is wrong — show the mechanism.
For concentration of measure:
"Density × volume is what determines where samples land. In d dimensions, a thin shell at radius r has volume proportional to r^(d-1). For Gaussian density e^(-r²/2), the product r^(d-1)·e^(-r²/2) is maximized at r = √(d−1) ≈ √d. In 3D, that's just √3 ≈ 1.7 — close to the origin, intuition holds. In 1000D, that's √1000 ≈ 31.6 — far from the origin, intuition collapses."
The mechanism explanation always involves how does the dimension d enter the formula? — and the answer reveals what fights what (density vs. volume, here).
Step 4: Install the correct high-dim picture
Replace the broken 3D picture with one that's correct in any dimension:
The new picture should be visualizable — use 1D or 2D analogies that survive the dimension change:
Step 5: Verify with a numpy / sympy demonstration when possible
A 3-line numpy script is worth a thousand words of intuition repair. Examples:
import numpy as np
# Concentration of measure
samples = np.random.randn(10000, 1000)
print(np.linalg.norm(samples, axis=1).mean()) # ≈ √1000 ≈ 31.6
print(np.linalg.norm(samples, axis=1).std()) # very small
# Angle concentration
v1 = np.random.randn(10000, 1000)
v2 = np.random.randn(10000, 1000)
cos = (v1 * v2).sum(axis=1) / (np.linalg.norm(v1, axis=1) * np.linalg.norm(v2, axis=1))
print(cos.mean(), cos.std()) # ≈ 0, ≈ 1/√1000
If you have Bash access, run the script and report the actual numbers. Concrete numbers anchor the new intuition.
Step 6: Generalize — what other 3D intuitions does this break?
Each of the Five Big Lies has cascading consequences. The user who learns about concentration of measure should also be told:
The generalization step is what makes the rebuild durable — it teaches the user to flag similar misuses of 3D intuition in the future.
For one full rebuild per phenomenon, see resources/phenomena.md. For numpy demonstration scripts, see resources/demos.md.
Almost every high-dim confusion is a version of one of these.
True in 3D: Yes — samples from a Gaussian or uniform on a ball do concentrate near the mean. False in high D: Samples concentrate on a thin shell at distance √d from the mean. The "near the mean" zone is empty. Phenomenon: Concentration of measure. Why it matters: Any reasoning about "typical" samples being "near the center" is wrong in high D. Latent-space interpolation, sampling, mode-seeking optimization all need rethinking.
True in 3D: The nearest neighbor of a query is meaningfully closer than the others. False in high D: The ratio of nearest-distance to farthest-distance approaches 1. Every point is approximately the same distance from any query. Phenomenon: Curse of dimensionality (specifically, distance concentration). Why it matters: Vanilla k-NN, vanilla similarity search, and naive distance-based clustering all degrade in high D. This is why learned embeddings (which break the concentration by introducing structure) are essential.
True in 3D: Two random unit vectors can have any angle from 0° to 180° with reasonable probability. False in high D: Two random unit vectors are nearly orthogonal (angle ≈ 90°) with high probability. Variance of cos θ is ~1/d. Phenomenon: Angle concentration. Why it matters: This is why cosine similarity for random embeddings is uninformative — and why learned embeddings, which break angle concentration, can produce meaningful similarities.
True in 3D: The unit ball occupies ~52% of the volume of the unit cube. False in high D: The volume of the unit ball goes to zero relative to the unit cube as d → ∞. Almost all the cube's volume is in the corners. Phenomenon: Cube-corner dominance. Why it matters: Uniformly sampling in a hypercube almost never lands inside the inscribed hypersphere. Bounding-box analyses and "is this in the ball?" tests behave very differently than 3D suggests.
True in 3D: A point in 3D really does have 3 degrees of freedom. False in high D — for real data: Real-world high-dim data (images, text embeddings, speech) lives on a low-dim manifold embedded in the ambient space. A 1000-D image vector might have only 50 effective degrees of freedom. Phenomenon: Manifold hypothesis. Why it matters: This is why ML works at all. Generative models, dimensionality reduction, autoencoders, and self-supervised learning all exploit this — they learn the manifold, not the ambient space.
Used when the learner has one specific misintuition and needs the corresponding rebuild. Structure: name the lie → show the mechanism → install the correct picture → numpy verify → generalize. Length: 5-10 minutes.
Used when one false intuition has produced multiple downstream confusions. Structure: full repair of the root lie → enumerate downstream consequences the learner has been confused about → quick repair of each. Length: 15-20 minutes.
Used when the user is about to do something where high-dim weirdness will bite (e.g., "I'm going to use Euclidean distance to compare these 2048-D embeddings"). Structure: warn that 3D intuition will fail here → name the phenomenon they're about to hit → suggest the correct alternative (e.g., cosine similarity instead of Euclidean) → explain why. Length: 5 minutes.
For one example of each pattern, see resources/phenomena.md.
Most embedding misintuitions are versions of Lie 3 (angle concentration). Embeddings work because learning breaks the angle concentration — semantically similar tokens get angles much smaller than the random baseline of 90°. The user who knows this can read embedding cosine similarities correctly.
Most latent-space misintuitions are versions of Lie 1 (concentration of measure). VAE latents are forced toward a standard normal, but standard normals in high D are shells, not blobs. Interpolating along the shell (e.g., spherical interpolation) is much more sensible than interpolating across the empty interior.
Loss landscapes inherit high-dim weirdness. The 3D bowl picture is especially misleading — in high D, almost all critical points are saddle points, not minima. SGD's apparent ability to "escape local minima" is partly because local minima are rarer than saddle points.
Default to flagging Lie 2 (nearest-neighbor breakdown). Euclidean distance often works much worse than cosine similarity in high D for real data. The reason: cosine survives the angle structure that learned embeddings impose; Euclidean is dominated by magnitude artifacts that say nothing.
| Lie (3D intuition) | Phenomenon (high-D truth) | Quick repair sentence | numpy demo |
|---|---|---|---|
| "Samples cluster near the mean" | Concentration of measure | "Samples concentrate on a thin shell at radius √d, not near the origin." | np.linalg.norm(np.random.randn(N, d), axis=1).mean() ≈ √d |
| "Nearest neighbor is meaningful" | Distance concentration | "Nearest and farthest distances converge — k-NN loses discriminative power." | Compute min, max distances from a query in random points; ratio → 1 |
| "Random vectors point random directions" | Angle concentration | "Two random unit vectors are nearly orthogonal in high D; cos θ has std ≈ 1/√d." | Compute pairwise cos of random vectors; mean ≈ 0, std ≈ 1/√d |
| "A ball fills its bounding cube" | Cube-corner dominance | "The unit ball occupies essentially 0% of the unit cube in high D." | Sample from unit cube, check what fraction land inside unit ball: → 0 |
| "1000 dim = 1000 degrees of freedom" | Manifold hypothesis | "Real high-D data lives on a low-D manifold inside the ambient space." | Run PCA on an image dataset; explained variance saturates fast |
For full worked rebuilds per phenomenon, see resources/phenomena.md. For numpy demonstration scripts, see resources/demos.md.
development
--- name: zettel-note description: The note-writing discipline for this vault's evergreen knowledge graph, modeled on a Zettelkasten reading companion and governed by the vault conventions. Enforces declarative-claim titles, one claim per note (atomicity), own-words prose with no block quotes, the piped [[slug|Title]] link form, the labeled link-relationship vocabulary (Confirms/Contradicts/Extends/Context/Prerequisite/Builds-on/Applies/Example-of/Contrasts-with), 3-6 links per note, and search-
development
Plans between-round FIFA World Cup Fantasy transfers — budgets the round's free transfer(s), forces out players whose nation has been eliminated, chases fixture-swing drops, upgrades on value, and decides when a rebuild is large enough to fire the Wildcard instead of spending free transfers one at a time. Ranks candidate in/out pairs by EV gain over each player's remaining survival horizon (delta xEV weighted by progression_carry) MINUS transfer cost (a free transfer is cheap, a points hit is real, churning the squad for marginal swings is a critic flag), and tags forced/fixture/upgrade priority. Emits a `transfer-plan` signal. Use when called by wc-squad-architect (whose transfer work this skill is the engine for) and by the strategists in the populate stage when their candidate is transfer-adjacent rather than a full rebuild.
testing
Reads and updates the FIFA World Cup Fantasy tournament state machine (footballfantasy/context/tournament-state.md) — the temporal backbone tracking phase (pre-tournament → group MD1-3 → R32 → R16 → QF → SF → final), budget ($100m group / $105m knockouts), nation cap (3 group, loosening in knockouts), chips remaining, surviving nations, each owned player's elimination-risk horizon, and deadlines. Validates state on load (count/feasibility checks), applies phase transitions, and appends to the append-only state log (never silent overwrite). Use to load state at the start of a run and to commit state changes after the manager makes a move.
development
Validates and persists FIFA World Cup Fantasy signal files to signals/YYYY-MM-DD-<type>.md. Checks the required frontmatter (type, round, date, emitted_by, confidence, source_urls), range-checks declared numeric signals, confirms every factual claim carries a source URL or "manager-provided", rejects unknown signal types, and refuses to persist a signal that fails validation (logging the failure instead). Keeps the inter-agent signal layer auditable so downstream agents can trust what they read and never re-derive it. Use whenever an agent or skill writes a signal.