Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

luqmannurhakimbazman/mlx-dev

Name: mlx-dev
Author: luqmannurhakimbazman

egg/skills/mlx-dev/SKILL.md

npx skillsauth add luqmannurhakimbazman/ashford mlx-dev

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

MLX Development Guide

Environment Setup

Use uv for Python environment and package management:

# Install MLX
uv add mlx

# Run MLX scripts
uv run python train.py

# Run with specific dependencies
uv run --with mlx python script.py

Critical Rules

1. Lazy Evaluation - Always Evaluate at Loop Boundaries

Operations build a graph; nothing computes until mx.eval():

# CORRECT: Evaluate at iteration boundaries
for batch in dataset:
    loss, grads = value_and_grad_fn(model, batch)
    optimizer.update(model, grads)
    mx.eval(loss, model.parameters())  # ALL computation here

# WRONG: Evaluating too frequently
for _ in range(100):
    a = a + b
    mx.eval(a)  # Massive overhead!

Implicit eval triggers: print(a), a.item(), np.array(a), if a > 0:.

2. Array Indexing Differs from NumPy

# Lists must be mx.array
a[[0, 1]]              # ValueError!
a[mx.array([0, 1])]    # Works

# Slice indices must be Python ints
i = mx.array(2)
x[i:i+2]               # ValueError!
x[i.item():i.item()+2] # Works (forces eval)

# Slices create COPIES, not views (opposite of NumPy)
b = a[:]
b[2] = 0  # a is unchanged!

# Boolean mask READS not supported
a[mask]  # Not supported - use mx.where()

# No bounds checking - out-of-bounds returns garbage

For accumulating updates, use at[] syntax:

a = a.at[idx].add(1)  # Properly accumulates at duplicate indices

See references/array-indexing.md for complete patterns.

3. Neural Networks: NHWC Format and call

# Conv2d uses NHWC (not NCHW like PyTorch)
x_mlx = mx.array(x_torch.numpy().transpose(0, 2, 3, 1))

# Override __call__, not forward()
class MyModel(nn.Module):
    def __call__(self, x):  # NOT forward()
        return self.layer(x)

# No dtype in constructors - use set_dtype()
layer = nn.Linear(10, 10)
layer.set_dtype(mx.bfloat16)

See references/neural-networks.md for layer equivalents.

4. Data Types: float64 is CPU-Only

a = mx.array([1.0], dtype=mx.float64)
mx.exp(a, stream=mx.gpu)  # RuntimeError!

# Solutions:
mx.exp(a, stream=mx.cpu)
mx.exp(a.astype(mx.float32))

# bfloat16 from external sources gets misinterpreted
from ml_dtypes import bfloat16
x = np.array(1., dtype=bfloat16)
mx.array(x)  # Returns complex64!
mx.array(x.astype(np.float32), dtype=mx.bfloat16)  # Correct

See references/dtypes.md for full type support table.

5. Compilation: Capture All Mutable State

from functools import partial

state = [model.state, optimizer.state, mx.random.state]  # Include random!

@partial(mx.compile, inputs=state, outputs=state)
def train_step(x, y):
    loss, grads = nn.value_and_grad(model, loss_fn)(model, x, y)
    optimizer.update(model, grads)
    return loss

# No print() in compiled functions - crashes during tracing
# String decoding triggers recompilation - decode outside loop

See references/compilation.md for recompilation triggers.

Quick Reference Tables

Dtype Support

| Type | GPU | Notes | |------|-----|-------| | float32 | Yes | Default float | | float16 | Yes | | | bfloat16 | Yes | M3+ recommended | | float64 | CPU only | GPU throws! | | int8-64, uint8-64 | Yes | | | complex64 | Partial | No matmul |

PyTorch -> MLX Equivalents

| PyTorch | MLX | |---------|-----| | tensor.to('cuda') | Not needed (unified memory) | | nn.forward() | nn.__call__() | | NCHW format | NHWC format | | torch.gather() | mx.take_along_axis() | | torch.scatter_add_() | arr.at[idx].add() |

Not Available in MLX

np.nonzero() - restructure algorithm
np.unique() - pre-sort or use dicts
arr[bool_mask] read - use mx.where()
np.linalg.det(), np.linalg.lstsq()

Performance Notes

Transformers: MLX typically 2-3x faster than PyTorch MPS
Convolutions: 10-150x SLOWER than PyTorch MPS (known limitation)
LLM inference: Excellent, especially quantized
Use float16/bfloat16 for 2x memory bandwidth
Use 4-bit quantization for LLMs (4x bandwidth)

Idiomatic Training Example

import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim
from functools import partial

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = [nn.Linear(784, 256), nn.Linear(256, 10)]

    def __call__(self, x):
        for layer in self.layers[:-1]:
            x = mx.maximum(layer(x), 0)
        return self.layers[-1](x)

def loss_fn(model, x, y):
    return nn.losses.cross_entropy(model(x), y, reduction="mean")

model = Model()
optimizer = optim.AdamW(learning_rate=1e-3)

state = [model.state, optimizer.state, mx.random.state]

@partial(mx.compile, inputs=state, outputs=state)
def train_step(x, y):
    loss, grads = nn.value_and_grad(model, loss_fn)(model, x, y)
    optimizer.update(model, grads)
    return loss

for epoch in range(num_epochs):
    for x_batch, y_batch in dataloader:
        loss = train_step(x_batch, y_batch)
        mx.eval(state)
    print(f"Epoch {epoch}: {loss.item():.4f}")

luqmannurhakimbazman/mlx-dev

egg/skills/mlx-dev/SKILL.md

Write correct, idiomatic Apple MLX code for Apple Silicon ML. Use when working with MLX arrays, neural networks, training loops, lazy evaluation, unified memory, mx.eval, mx.compile, Metal GPU, memory optimization, quantization, or Apple Silicon performance. Covers critical API differences from PyTorch/NumPy, array indexing gotchas (lists must be mx.array, slices create copies), NHWC format for Conv2d, __call__ not forward(), float64 CPU-only, mlx-lm integration, and debugging patterns.

development

Updated Apr 7, 2026

$ install --global

skillsauth

npx skillsauth add luqmannurhakimbazman/ashford mlx-dev

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 7, 2026, 2:47 AM30.9s11 files scanned

SKILL.md

name:: mlx-dev
description:: Write correct, idiomatic Apple MLX code for Apple Silicon ML. Use when working with MLX arrays, neural networks, training loops, lazy evaluation, unified memory, mx.eval, mx.compile, Metal GPU, memory optimization, quantization, or Apple Silicon performance. Covers critical API differences from PyTorch/NumPy, array indexing gotchas (lists must be mx.array, slices create copies), NHWC format for Conv2d, __call__ not forward(), float64 CPU-only, mlx-lm integration, and debugging patterns.

MLX Development Guide

Environment Setup

Use uv for Python environment and package management:

# Install MLX
uv add mlx

# Run MLX scripts
uv run python train.py

# Run with specific dependencies
uv run --with mlx python script.py

Critical Rules

1. Lazy Evaluation - Always Evaluate at Loop Boundaries

Operations build a graph; nothing computes until mx.eval():

# CORRECT: Evaluate at iteration boundaries
for batch in dataset:
    loss, grads = value_and_grad_fn(model, batch)
    optimizer.update(model, grads)
    mx.eval(loss, model.parameters())  # ALL computation here

# WRONG: Evaluating too frequently
for _ in range(100):
    a = a + b
    mx.eval(a)  # Massive overhead!

Implicit eval triggers: print(a), a.item(), np.array(a), if a > 0:.

2. Array Indexing Differs from NumPy

# Lists must be mx.array
a[[0, 1]]              # ValueError!
a[mx.array([0, 1])]    # Works

# Slice indices must be Python ints
i = mx.array(2)
x[i:i+2]               # ValueError!
x[i.item():i.item()+2] # Works (forces eval)

# Slices create COPIES, not views (opposite of NumPy)
b = a[:]
b[2] = 0  # a is unchanged!

# Boolean mask READS not supported
a[mask]  # Not supported - use mx.where()

# No bounds checking - out-of-bounds returns garbage

For accumulating updates, use at[] syntax:

a = a.at[idx].add(1)  # Properly accumulates at duplicate indices

See references/array-indexing.md for complete patterns.

3. Neural Networks: NHWC Format and call

# Conv2d uses NHWC (not NCHW like PyTorch)
x_mlx = mx.array(x_torch.numpy().transpose(0, 2, 3, 1))

# Override __call__, not forward()
class MyModel(nn.Module):
    def __call__(self, x):  # NOT forward()
        return self.layer(x)

# No dtype in constructors - use set_dtype()
layer = nn.Linear(10, 10)
layer.set_dtype(mx.bfloat16)

See references/neural-networks.md for layer equivalents.

4. Data Types: float64 is CPU-Only

a = mx.array([1.0], dtype=mx.float64)
mx.exp(a, stream=mx.gpu)  # RuntimeError!

# Solutions:
mx.exp(a, stream=mx.cpu)
mx.exp(a.astype(mx.float32))

# bfloat16 from external sources gets misinterpreted
from ml_dtypes import bfloat16
x = np.array(1., dtype=bfloat16)
mx.array(x)  # Returns complex64!
mx.array(x.astype(np.float32), dtype=mx.bfloat16)  # Correct

See references/dtypes.md for full type support table.

5. Compilation: Capture All Mutable State

from functools import partial

state = [model.state, optimizer.state, mx.random.state]  # Include random!

@partial(mx.compile, inputs=state, outputs=state)
def train_step(x, y):
    loss, grads = nn.value_and_grad(model, loss_fn)(model, x, y)
    optimizer.update(model, grads)
    return loss

# No print() in compiled functions - crashes during tracing
# String decoding triggers recompilation - decode outside loop

See references/compilation.md for recompilation triggers.

Quick Reference Tables

Dtype Support

PyTorch -> MLX Equivalents

Not Available in MLX

np.nonzero() - restructure algorithm
np.unique() - pre-sort or use dicts
arr[bool_mask] read - use mx.where()
np.linalg.det(), np.linalg.lstsq()

Performance Notes

Transformers: MLX typically 2-3x faster than PyTorch MPS
Convolutions: 10-150x SLOWER than PyTorch MPS (known limitation)
LLM inference: Excellent, especially quantized
Use float16/bfloat16 for 2x memory bandwidth
Use 4-bit quantization for LLMs (4x bandwidth)

Idiomatic Training Example

import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim
from functools import partial

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = [nn.Linear(784, 256), nn.Linear(256, 10)]

    def __call__(self, x):
        for layer in self.layers[:-1]:
            x = mx.maximum(layer(x), 0)
        return self.layers[-1](x)

def loss_fn(model, x, y):
    return nn.losses.cross_entropy(model(x), y, reduction="mean")

model = Model()
optimizer = optim.AdamW(learning_rate=1e-3)

state = [model.state, optimizer.state, mx.random.state]

@partial(mx.compile, inputs=state, outputs=state)
def train_step(x, y):
    loss, grads = nn.value_and_grad(model, loss_fn)(model, x, y)
    optimizer.update(model, grads)
    return loss

for epoch in range(num_epochs):
    for x_batch, y_batch in dataloader:
        loss = train_step(x_batch, y_batch)
        mx.eval(state)
    print(f"Epoch {epoch}: {loss.item():.4f}")

Related Skills

luqmannurhakimbazman/technical-interview-roadmap

development

VerifiedTrustedCommunity

This skill should be used when the user wants a technical interview preparation roadmap, coding interview study plan, or DSA practice plan tailored to a specific company and role. Trigger phrases include "technical interview roadmap", "coding interview prep for", "DSA roadmap for", "DSA study plan", "leetcode prep for", "what problems should I practice for", "interview study plan", "prep me for the technical rounds", "technical prep for", "what should I study for", "coding prep plan", "roadmap from this JD", "prep me for this role [URL]", or providing a JD URL with a request for technical interview preparation.

SKILL.mdUpdated Apr 7, 2026

luqmannurhakimbazman/technical-interview-roadmap

luqmannurhakimbazman/Technical Blog Writer

development

VerifiedTrustedCommunity

This skill should be used when the user asks to "write a blog post", "draft a blog post", "create a technical blog", "write a deep dive", "write an explainer", "blog about", "write a tutorial post", "turn this into a blog post", or wants to create technical content for a personal blog or static site. Default platform is Jekyll (Gundersen-style) with KaTeX math, BibTeX citations via jekyll-scholar, and custom figure HTML. Covers deep dives, explainers, tutorials, and project showcases on ML, statistics, computer science, finance, math, and quantitative topics. Generates Markdown with SEO frontmatter, code examples, and diagram suggestions.

SKILL.mdUpdated Apr 7, 2026

luqmannurhakimbazman/Technical Blog Writer

luqmannurhakimbazman/resume-tailor

development

VerifiedTrustedCommunity

This skill should be used when the user has already run resume-analyzer and wants to generate the tailored resume.tex. Trigger phrases include "generate resume", "write the resume", "create resume.tex", "tailor the resume now", "build the resume from notes", or when the user asks to proceed after a resume analysis session. It reads the notes.md produced by resume-analyzer and generates a tailored LaTeX resume.

SKILL.mdUpdated Apr 7, 2026

luqmannurhakimbazman/resume-tailor

luqmannurhakimbazman/resume-analyzer

development

VerifiedTrustedCommunity

This skill should be used when the user wants to analyze a job description against their resume, extract keywords, identify gaps, or prepare tailoring notes. Trigger phrases include "analyze JD", "analyze this job description", "extract keywords from JD", "gap analysis for", "what does this role need", "compare my resume to this JD", "tailor resume", "optimize resume for JD", "build resume for", "target job description", "customize resume for", "resume for this role", "refactor resume", "update resume for", "match resume to JD", or when a user pastes a job description alongside their resume. It produces a notes.md analysis file that resume-tailor uses to generate the final resume.

SKILL.mdUpdated Apr 7, 2026

luqmannurhakimbazman/resume-analyzer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/luqmannurhakimbazman/ashford.git

# Copy into Claude Code skills folder (global)
cp -r ashford/egg/skills/mlx-dev ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

luqmannurhakimbazman/ashford

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

luqmannurhakimbazman/mlx-dev

$ install --global

Security Scan Results

SKILL.md

MLX Development Guide

Environment Setup

Critical Rules

1. Lazy Evaluation - Always Evaluate at Loop Boundaries

2. Array Indexing Differs from NumPy

3. Neural Networks: NHWC Format and call

4. Data Types: float64 is CPU-Only

5. Compilation: Capture All Mutable State

Quick Reference Tables

Dtype Support

PyTorch -> MLX Equivalents

Not Available in MLX

Performance Notes

See Also

Idiomatic Training Example

Related Skills

luqmannurhakimbazman/technical-interview-roadmap

luqmannurhakimbazman/Technical Blog Writer

luqmannurhakimbazman/resume-tailor

luqmannurhakimbazman/resume-analyzer

luqmannurhakimbazman/mlx-dev

$ install --global

Security Scan Results

SKILL.md

MLX Development Guide

Environment Setup

Critical Rules

1. Lazy Evaluation - Always Evaluate at Loop Boundaries

2. Array Indexing Differs from NumPy

3. Neural Networks: NHWC Format and call

4. Data Types: float64 is CPU-Only

5. Compilation: Capture All Mutable State

Quick Reference Tables

Dtype Support

PyTorch -> MLX Equivalents

Not Available in MLX

Performance Notes

See Also

Idiomatic Training Example

Related Skills

luqmannurhakimbazman/technical-interview-roadmap

luqmannurhakimbazman/Technical Blog Writer

luqmannurhakimbazman/resume-tailor

luqmannurhakimbazman/resume-analyzer