egg/skills/mlx-dev/SKILL.md
Write correct, idiomatic Apple MLX code for Apple Silicon ML. Use when working with MLX arrays, neural networks, training loops, lazy evaluation, unified memory, mx.eval, mx.compile, Metal GPU, memory optimization, quantization, or Apple Silicon performance. Covers critical API differences from PyTorch/NumPy, array indexing gotchas (lists must be mx.array, slices create copies), NHWC format for Conv2d, __call__ not forward(), float64 CPU-only, mlx-lm integration, and debugging patterns.
npx skillsauth add luqmannurhakimbazman/ashford mlx-devInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use uv for Python environment and package management:
# Install MLX
uv add mlx
# Run MLX scripts
uv run python train.py
# Run with specific dependencies
uv run --with mlx python script.py
Operations build a graph; nothing computes until mx.eval():
# CORRECT: Evaluate at iteration boundaries
for batch in dataset:
loss, grads = value_and_grad_fn(model, batch)
optimizer.update(model, grads)
mx.eval(loss, model.parameters()) # ALL computation here
# WRONG: Evaluating too frequently
for _ in range(100):
a = a + b
mx.eval(a) # Massive overhead!
Implicit eval triggers: print(a), a.item(), np.array(a), if a > 0:.
# Lists must be mx.array
a[[0, 1]] # ValueError!
a[mx.array([0, 1])] # Works
# Slice indices must be Python ints
i = mx.array(2)
x[i:i+2] # ValueError!
x[i.item():i.item()+2] # Works (forces eval)
# Slices create COPIES, not views (opposite of NumPy)
b = a[:]
b[2] = 0 # a is unchanged!
# Boolean mask READS not supported
a[mask] # Not supported - use mx.where()
# No bounds checking - out-of-bounds returns garbage
For accumulating updates, use at[] syntax:
a = a.at[idx].add(1) # Properly accumulates at duplicate indices
See references/array-indexing.md for complete patterns.
# Conv2d uses NHWC (not NCHW like PyTorch)
x_mlx = mx.array(x_torch.numpy().transpose(0, 2, 3, 1))
# Override __call__, not forward()
class MyModel(nn.Module):
def __call__(self, x): # NOT forward()
return self.layer(x)
# No dtype in constructors - use set_dtype()
layer = nn.Linear(10, 10)
layer.set_dtype(mx.bfloat16)
See references/neural-networks.md for layer equivalents.
a = mx.array([1.0], dtype=mx.float64)
mx.exp(a, stream=mx.gpu) # RuntimeError!
# Solutions:
mx.exp(a, stream=mx.cpu)
mx.exp(a.astype(mx.float32))
# bfloat16 from external sources gets misinterpreted
from ml_dtypes import bfloat16
x = np.array(1., dtype=bfloat16)
mx.array(x) # Returns complex64!
mx.array(x.astype(np.float32), dtype=mx.bfloat16) # Correct
See references/dtypes.md for full type support table.
from functools import partial
state = [model.state, optimizer.state, mx.random.state] # Include random!
@partial(mx.compile, inputs=state, outputs=state)
def train_step(x, y):
loss, grads = nn.value_and_grad(model, loss_fn)(model, x, y)
optimizer.update(model, grads)
return loss
# No print() in compiled functions - crashes during tracing
# String decoding triggers recompilation - decode outside loop
See references/compilation.md for recompilation triggers.
| Type | GPU | Notes | |------|-----|-------| | float32 | Yes | Default float | | float16 | Yes | | | bfloat16 | Yes | M3+ recommended | | float64 | CPU only | GPU throws! | | int8-64, uint8-64 | Yes | | | complex64 | Partial | No matmul |
| PyTorch | MLX |
|---------|-----|
| tensor.to('cuda') | Not needed (unified memory) |
| nn.forward() | nn.__call__() |
| NCHW format | NHWC format |
| torch.gather() | mx.take_along_axis() |
| torch.scatter_add_() | arr.at[idx].add() |
np.nonzero() - restructure algorithmnp.unique() - pre-sort or use dictsarr[bool_mask] read - use mx.where()np.linalg.det(), np.linalg.lstsq()import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim
from functools import partial
class Model(nn.Module):
def __init__(self):
super().__init__()
self.layers = [nn.Linear(784, 256), nn.Linear(256, 10)]
def __call__(self, x):
for layer in self.layers[:-1]:
x = mx.maximum(layer(x), 0)
return self.layers[-1](x)
def loss_fn(model, x, y):
return nn.losses.cross_entropy(model(x), y, reduction="mean")
model = Model()
optimizer = optim.AdamW(learning_rate=1e-3)
state = [model.state, optimizer.state, mx.random.state]
@partial(mx.compile, inputs=state, outputs=state)
def train_step(x, y):
loss, grads = nn.value_and_grad(model, loss_fn)(model, x, y)
optimizer.update(model, grads)
return loss
for epoch in range(num_epochs):
for x_batch, y_batch in dataloader:
loss = train_step(x_batch, y_batch)
mx.eval(state)
print(f"Epoch {epoch}: {loss.item():.4f}")
development
This skill should be used when the user wants a technical interview preparation roadmap, coding interview study plan, or DSA practice plan tailored to a specific company and role. Trigger phrases include "technical interview roadmap", "coding interview prep for", "DSA roadmap for", "DSA study plan", "leetcode prep for", "what problems should I practice for", "interview study plan", "prep me for the technical rounds", "technical prep for", "what should I study for", "coding prep plan", "roadmap from this JD", "prep me for this role [URL]", or providing a JD URL with a request for technical interview preparation.
development
This skill should be used when the user asks to "write a blog post", "draft a blog post", "create a technical blog", "write a deep dive", "write an explainer", "blog about", "write a tutorial post", "turn this into a blog post", or wants to create technical content for a personal blog or static site. Default platform is Jekyll (Gundersen-style) with KaTeX math, BibTeX citations via jekyll-scholar, and custom figure HTML. Covers deep dives, explainers, tutorials, and project showcases on ML, statistics, computer science, finance, math, and quantitative topics. Generates Markdown with SEO frontmatter, code examples, and diagram suggestions.
development
This skill should be used when the user has already run resume-analyzer and wants to generate the tailored resume.tex. Trigger phrases include "generate resume", "write the resume", "create resume.tex", "tailor the resume now", "build the resume from notes", or when the user asks to proceed after a resume analysis session. It reads the notes.md produced by resume-analyzer and generates a tailored LaTeX resume.
development
This skill should be used when the user wants to analyze a job description against their resume, extract keywords, identify gaps, or prepare tailoring notes. Trigger phrases include "analyze JD", "analyze this job description", "extract keywords from JD", "gap analysis for", "what does this role need", "compare my resume to this JD", "tailor resume", "optimize resume for JD", "build resume for", "target job description", "customize resume for", "resume for this role", "refactor resume", "update resume for", "match resume to JD", or when a user pastes a job description alongside their resume. It produces a notes.md analysis file that resume-tailor uses to generate the final resume.