skills/domain-ml/SKILL.md
Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理
npx skillsauth add actionbook/rust-skills domain-mlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Layer 3: Domain Constraints
| Domain Rule | Design Constraint | Rust Implication | |-------------|-------------------|------------------| | Large data | Efficient memory | Zero-copy, streaming | | GPU acceleration | CUDA/Metal support | candle, tch-rs | | Model portability | Standard formats | ONNX | | Batch processing | Throughput over latency | Batched inference | | Numerical precision | Float handling | ndarray, careful f32/f64 | | Reproducibility | Deterministic | Seeded random, versioning |
RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops
RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading
RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle
From constraints to design (Layer 2):
"Need efficient data pipelines"
↓ m10-performance: Streaming, batching
↓ polars: Lazy evaluation
"Need GPU inference"
↓ m07-concurrency: Async data loading
↓ candle/tch-rs: CUDA backend
"Need model loading"
↓ m12-lifecycle: Lazy init, caching
↓ tract: ONNX runtime
| Use Case | Recommended | Why | |----------|-------------|-----| | Inference only | tract (ONNX) | Lightweight, portable | | Training + inference | candle, burn | Pure Rust, GPU | | PyTorch models | tch-rs | Direct bindings | | Data pipelines | polars | Fast, lazy eval |
| Purpose | Crate | |---------|-------| | Tensors | ndarray | | ONNX inference | tract | | ML framework | candle, burn | | PyTorch bindings | tch-rs | | Data processing | polars | | Embeddings | fastembed |
| Pattern | Purpose | Implementation |
|---------|---------|----------------|
| Model loading | Once, reuse | OnceLock<Model> |
| Batching | Throughput | Collect then process |
| Streaming | Large data | Iterator-based |
| GPU async | Parallelism | Data loading parallel to compute |
use std::sync::OnceLock;
use tract_onnx::prelude::*;
static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();
fn get_model() -> &'static SimplePlan<...> {
MODEL.get_or_init(|| {
tract_onnx::onnx()
.model_for_path("model.onnx")
.unwrap()
.into_optimized()
.unwrap()
.into_runnable()
.unwrap()
})
}
async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
let model = get_model();
let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
let result = model.run(tvec!(input.into()))?;
Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
let mut results = Vec::with_capacity(inputs.len());
for batch in inputs.chunks(batch_size) {
// Stack inputs into batch tensor
let batch_tensor = stack_inputs(batch);
// Run inference on batch
let batch_output = model.run(batch_tensor).await;
// Unstack results
results.extend(unstack_outputs(batch_output));
}
results
}
| Mistake | Domain Violation | Fix | |---------|-----------------|-----| | Clone tensors | Memory waste | Use views | | Single inference | GPU underutilized | Batch processing | | Load model per request | Slow | Singleton pattern | | Sync data loading | GPU idle | Async pipeline |
| Constraint | Layer 2 Pattern | Layer 1 Implementation | |------------|-----------------|------------------------| | Memory efficiency | Zero-copy | ndarray views | | Model singleton | Lazy init | OnceLock<Model> | | Batch processing | Chunked iteration | chunks() + parallel | | GPU async | Concurrent loading | tokio::spawn + GPU |
| When | See | |------|-----| | Performance | m10-performance | | Lazy initialization | m12-lifecycle | | Async patterns | m07-concurrency | | Memory efficiency | m01-ownership |
development
CRITICAL: Use for ALL Rust questions including errors, design, and coding. HIGHEST PRIORITY for: 比较, 对比, compare, vs, versus, 区别, difference, 最佳实践, best practice, tokio vs, async-std vs, 比较 tokio, 比较 async, Triggers on: Rust, cargo, rustc, crate, Cargo.toml, 意图分析, 问题分析, 语义分析, analyze intent, question analysis, compile error, borrow error, lifetime error, ownership error, type error, trait error, value moved, cannot borrow, does not live long enough, mismatched types, not satisfied, E0382, E0597, E0277, E0308, E0499, E0502, E0596, async, await, Send, Sync, tokio, concurrency, error handling, 编译错误, compile error, 所有权, ownership, 借用, borrow, 生命周期, lifetime, 类型错误, type error, 异步, async, 并发, concurrency, 错误处理, error handling, 问题, problem, question, 怎么用, how to use, 如何, how to, 为什么, why, 什么是, what is, 帮我写, help me write, 实现, implement, 解释, explain
development
Internal maintenance support for checking and fixing generated Rust skill documentation references. Use only when explicitly invoked by /fix-skill-docs.
development
Internal command support for dynamic Rust crate skill management. Use only when explicitly invoked by /sync-crate-skills, /clean-crate-skills, or /update-crate-skill.
tools
Internal support skill for agent-browser CLI workflows used by rust-learner, docs-researcher, and crate-researcher. Use only when browser automation is explicitly required.