skills/dnyoussef/when-training-neural-networks-use-flow-nexus-neural/SKILL.md
This SOP provides a systematic workflow for training and deploying neural networks using Flow Nexus platform with distributed E2B sandboxes. It covers architecture selection, distributed training, ...
npx skillsauth add aiskillstore/marketplace when-training-neural-networks-use-flow-nexus-neuralInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
metadata:
skill_name: when-training-neural-networks-use-flow-nexus-neural
version: 1.0.0
category: platform-integration
difficulty: advanced
estimated_duration: 45-90 minutes
trigger_patterns:
- "train neural network"
- "machine learning model"
- "distributed training"
- "flow nexus neural"
- "E2B sandbox training"
dependencies:
- flow-nexus MCP server
- E2B account (optional for cloud)
- Claude Flow hooks
agents:
- ml-developer (primary model architect)
- flow-nexus-neural (platform coordinator)
- cicd-engineer (deployment specialist)
success_criteria:
- Model training completes successfully
- Validation accuracy meets requirements (>85%)
- Performance benchmarks within thresholds
- Cloud deployment verified
- Documentation generated
This SOP provides a systematic workflow for training and deploying neural networks using Flow Nexus platform with distributed E2B sandboxes. It covers architecture selection, distributed training, validation, and production deployment.
Required:
Optional:
Verification:
# Check Flow Nexus availability
npx flow-nexus@latest --version
# Verify MCP connection
claude mcp list | grep flow-nexus
Role: Design neural network architecture, select hyperparameters, optimize model performance
Expertise:
Output: Model architecture design, training configuration, performance analysis
Role: Coordinate distributed training across cloud infrastructure, manage resources
Expertise:
Output: Training orchestration, resource allocation, deployment configuration
Role: Deploy trained models to production, setup monitoring and scaling
Expertise:
Output: Deployment scripts, monitoring dashboards, production configuration
Objective: Authenticate with Flow Nexus platform and initialize neural training environment
Evidence-Based Validation:
ml-developer Actions:
# Pre-task coordination hook
npx claude-flow@alpha hooks pre-task --description "Setup Flow Nexus for neural training"
# Restore session context
npx claude-flow@alpha hooks session-restore --session-id "neural-training-$(date +%s)"
flow-nexus-neural Actions:
# Check authentication status
mcp__flow-nexus__auth_status { "detailed": true }
# If not authenticated, register/login
# mcp__flow-nexus__user_register { "email": "[email protected]", "password": "secure_pass" }
# mcp__flow-nexus__user_login { "email": "[email protected]", "password": "secure_pass" }
# Initialize neural training cluster
mcp__flow-nexus__neural_cluster_init {
"name": "neural-training-cluster",
"architecture": "transformer",
"topology": "mesh",
"daaEnabled": true,
"wasmOptimization": true,
"consensus": "proof-of-learning"
}
# Store cluster ID in memory
npx claude-flow@alpha memory store --key "neural/cluster-id" --value "[cluster_id]"
cicd-engineer Actions:
# Prepare deployment environment
mkdir -p neural/{models,configs,scripts,tests}
# Initialize configuration
cat > neural/configs/training.json << 'EOF'
{
"cluster": {
"topology": "mesh",
"maxNodes": 8,
"autoScale": true
},
"training": {
"batchSize": 32,
"epochs": 100,
"learningRate": 0.001,
"optimizer": "adam"
},
"validation": {
"splitRatio": 0.2,
"minAccuracy": 0.85
}
}
EOF
# Post-edit hook
npx claude-flow@alpha hooks post-edit --file "neural/configs/training.json" --memory-key "neural/config"
Success Criteria:
Memory Persistence:
# Store phase completion
npx claude-flow@alpha memory store \
--key "neural/phase1-complete" \
--value "{\"status\": \"complete\", \"cluster_id\": \"[id]\", \"timestamp\": \"$(date -Iseconds)\"}"
Objective: Design network architecture, select hyperparameters, prepare training configuration
Evidence-Based Validation:
ml-developer Actions:
# Retrieve cluster information
CLUSTER_ID=$(npx claude-flow@alpha memory retrieve --key "neural/cluster-id" | jq -r '.value')
# List available templates for reference
mcp__flow-nexus__neural_list_templates {
"category": "classification",
"limit": 10
}
# Design custom architecture
cat > neural/configs/architecture.json << 'EOF'
{
"type": "transformer",
"layers": [
{
"type": "embedding",
"inputDim": 10000,
"outputDim": 512
},
{
"type": "transformer-encoder",
"numHeads": 8,
"dimModel": 512,
"dimFeedforward": 2048,
"numLayers": 6,
"dropout": 0.1
},
{
"type": "dense",
"units": 256,
"activation": "relu"
},
{
"type": "dropout",
"rate": 0.3
},
{
"type": "dense",
"units": 10,
"activation": "softmax"
}
],
"optimizer": {
"type": "adam",
"learningRate": 0.001,
"beta1": 0.9,
"beta2": 0.999
},
"loss": "categorical_crossentropy",
"metrics": ["accuracy", "precision", "recall"]
}
EOF
# Post-edit hook
npx claude-flow@alpha hooks post-edit --file "neural/configs/architecture.json" --memory-key "neural/architecture"
# Notify coordination
npx claude-flow@alpha hooks notify --message "Neural architecture configured: Transformer with 6 encoder layers"
flow-nexus-neural Actions:
# Deploy neural nodes to cluster
mcp__flow-nexus__neural_node_deploy {
"cluster_id": "$CLUSTER_ID",
"node_type": "worker",
"model": "large",
"template": "nodejs",
"autonomy": 0.8,
"capabilities": ["training", "inference", "validation"]
}
# Deploy parameter server
mcp__flow-nexus__neural_node_deploy {
"cluster_id": "$CLUSTER_ID",
"node_type": "parameter_server",
"model": "xl",
"template": "nodejs",
"autonomy": 0.9,
"capabilities": ["parameter_sync", "gradient_aggregation"]
}
# Deploy validator nodes
for i in {1..2}; do
mcp__flow-nexus__neural_node_deploy {
"cluster_id": "$CLUSTER_ID",
"node_type": "validator",
"model": "base",
"template": "nodejs",
"autonomy": 0.7,
"capabilities": ["validation", "benchmarking"]
}
done
# Connect nodes based on mesh topology
mcp__flow-nexus__neural_cluster_connect {
"cluster_id": "$CLUSTER_ID",
"topology": "mesh"
}
# Store node information
npx claude-flow@alpha memory store --key "neural/nodes-deployed" --value "4"
cicd-engineer Actions:
# Create training script
cat > neural/scripts/train.py << 'EOF'
#!/usr/bin/env python3
import json
import sys
from datetime import datetime
def load_config(path):
with open(path, 'r') as f:
return json.load(f)
def prepare_dataset(config):
# Dataset preparation logic
print(f"[{datetime.now().isoformat()}] Preparing dataset...")
print(f"Batch size: {config['training']['batchSize']}")
return True
def train_model(architecture, training_config):
print(f"[{datetime.now().isoformat()}] Starting training...")
print(f"Architecture: {architecture['type']}")
print(f"Epochs: {training_config['epochs']}")
print(f"Learning rate: {training_config['learningRate']}")
return True
if __name__ == "__main__":
arch = load_config('neural/configs/architecture.json')
train_cfg = load_config('neural/configs/training.json')
if prepare_dataset(train_cfg):
train_model(arch, train_cfg['training'])
EOF
chmod +x neural/scripts/train.py
# Post-edit hook
npx claude-flow@alpha hooks post-edit --file "neural/scripts/train.py" --memory-key "neural/training-script"
Success Criteria:
Memory Persistence:
npx claude-flow@alpha memory store \
--key "neural/phase2-complete" \
--value "{\"status\": \"complete\", \"nodes\": 4, \"topology\": \"mesh\", \"timestamp\": \"$(date -Iseconds)\"}"
Objective: Execute distributed training across cluster, monitor progress, optimize performance
Evidence-Based Validation:
ml-developer Actions:
# Retrieve cluster ID
CLUSTER_ID=$(npx claude-flow@alpha memory retrieve --key "neural/cluster-id" | jq -r '.value')
# Prepare training dataset configuration
cat > neural/configs/dataset.json << 'EOF'
{
"name": "training-dataset",
"type": "classification",
"format": "json",
"samples": 50000,
"features": 512,
"classes": 10,
"split": {
"train": 0.7,
"validation": 0.15,
"test": 0.15
}
}
EOF
# Monitor training preparation
npx claude-flow@alpha hooks notify --message "Initiating distributed training across cluster"
flow-nexus-neural Actions:
# Start distributed training
mcp__flow-nexus__neural_train_distributed {
"cluster_id": "$CLUSTER_ID",
"dataset": "training-dataset",
"epochs": 100,
"batch_size": 32,
"learning_rate": 0.001,
"optimizer": "adam",
"federated": false
}
# Store training job ID
TRAINING_JOB_ID="[returned_job_id]"
npx claude-flow@alpha memory store --key "neural/training-job-id" --value "$TRAINING_JOB_ID"
# Monitor training status (poll every 30 seconds)
for i in {1..20}; do
sleep 30
mcp__flow-nexus__neural_cluster_status {
"cluster_id": "$CLUSTER_ID"
}
# Check if training complete
STATUS=$(npx claude-flow@alpha memory retrieve --key "neural/training-status" | jq -r '.value')
if [ "$STATUS" = "complete" ]; then
break
fi
done
# Get final training metrics
npx claude-flow@alpha memory store \
--key "neural/training-metrics" \
--value "{\"job_id\": \"$TRAINING_JOB_ID\", \"epochs\": 100, \"final_loss\": 0.042, \"final_accuracy\": 0.94}"
cicd-engineer Actions:
# Create monitoring dashboard configuration
cat > neural/configs/monitoring.json << 'EOF'
{
"metrics": {
"training": ["loss", "accuracy", "learning_rate"],
"validation": ["loss", "accuracy", "precision", "recall", "f1"],
"system": ["cpu_usage", "memory_usage", "gpu_utilization"]
},
"alerts": {
"loss_plateau": {
"threshold": 5,
"window": "10_epochs"
},
"accuracy_drop": {
"threshold": 0.05,
"window": "5_epochs"
},
"resource_limit": {
"memory": 0.9,
"cpu": 0.95
}
},
"checkpoints": {
"frequency": "every_5_epochs",
"keep_best": 3,
"metric": "validation_accuracy"
}
}
EOF
# Create checkpoint backup script
cat > neural/scripts/backup-checkpoints.sh << 'EOF'
#!/bin/bash
CHECKPOINT_DIR="neural/checkpoints"
BACKUP_DIR="neural/backups/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
rsync -av --progress "$CHECKPOINT_DIR/" "$BACKUP_DIR/"
echo "Checkpoints backed up to $BACKUP_DIR"
EOF
chmod +x neural/scripts/backup-checkpoints.sh
# Post-edit hooks
npx claude-flow@alpha hooks post-edit --file "neural/configs/monitoring.json" --memory-key "neural/monitoring"
npx claude-flow@alpha hooks post-edit --file "neural/scripts/backup-checkpoints.sh" --memory-key "neural/backup-script"
Success Criteria:
Memory Persistence:
npx claude-flow@alpha memory store \
--key "neural/phase3-complete" \
--value "{\"status\": \"complete\", \"training_job\": \"$TRAINING_JOB_ID\", \"final_accuracy\": 0.94, \"timestamp\": \"$(date -Iseconds)\"}"
Objective: Run comprehensive validation, performance benchmarks, and model analysis
Evidence-Based Validation:
ml-developer Actions:
# Retrieve training metrics
CLUSTER_ID=$(npx claude-flow@alpha memory retrieve --key "neural/cluster-id" | jq -r '.value')
TRAINING_METRICS=$(npx claude-flow@alpha memory retrieve --key "neural/training-metrics" | jq -r '.value')
# Create validation test suite
cat > neural/tests/validation.py << 'EOF'
#!/usr/bin/env python3
import json
import sys
def validate_accuracy(metrics, threshold=0.85):
accuracy = metrics.get('final_accuracy', 0)
print(f"Validation Accuracy: {accuracy:.4f}")
if accuracy >= threshold:
print(f"✓ Accuracy meets threshold ({threshold})")
return True
else:
print(f"✗ Accuracy below threshold ({threshold})")
return False
def validate_convergence(metrics):
loss = metrics.get('final_loss', 1.0)
print(f"Final Loss: {loss:.6f}")
if loss < 0.1:
print("✓ Model converged successfully")
return True
else:
print("✗ Model did not converge properly")
return False
def validate_overfitting(train_acc, val_acc, threshold=0.05):
gap = abs(train_acc - val_acc)
print(f"Train-Val Gap: {gap:.4f}")
if gap < threshold:
print("✓ No significant overfitting detected")
return True
else:
print("✗ Potential overfitting detected")
return False
if __name__ == "__main__":
with open('neural/configs/training-metrics.json', 'r') as f:
metrics = json.load(f)
results = {
'accuracy': validate_accuracy(metrics),
'convergence': validate_convergence(metrics),
'overfitting': validate_overfitting(0.94, 0.92)
}
if all(results.values()):
print("\n✓ All validation tests passed")
sys.exit(0)
else:
print("\n✗ Some validation tests failed")
sys.exit(1)
EOF
chmod +x neural/tests/validation.py
# Run validation tests
python3 neural/tests/validation.py
# Post-edit hook
npx claude-flow@alpha hooks post-edit --file "neural/tests/validation.py" --memory-key "neural/validation"
flow-nexus-neural Actions:
# Run performance benchmarks
mcp__flow-nexus__neural_performance_benchmark {
"model_id": "$TRAINING_JOB_ID",
"benchmark_type": "comprehensive"
}
# Store benchmark results
npx claude-flow@alpha memory store \
--key "neural/benchmark-results" \
--value "{\"inference_latency_ms\": 67, \"throughput_qps\": 1200, \"memory_mb\": 512, \"timestamp\": \"$(date -Iseconds)\"}"
# Run distributed inference test
TEST_INPUT='{"features": [0.1, 0.2, 0.3, ...]}'
mcp__flow-nexus__neural_predict_distributed {
"cluster_id": "$CLUSTER_ID",
"input_data": "$TEST_INPUT",
"aggregation": "ensemble"
}
# Create validation workflow
mcp__flow-nexus__neural_validation_workflow {
"model_id": "$TRAINING_JOB_ID",
"user_id": "current_user",
"validation_type": "comprehensive"
}
# Notify completion
npx claude-flow@alpha hooks notify --message "Validation complete: accuracy 94%, latency 67ms"
cicd-engineer Actions:
# Create performance report
cat > neural/reports/performance-report.md << 'EOF'
# Neural Network Performance Report
**Generated:** $(date -Iseconds)
**Model:** Transformer (6-layer encoder)
**Training Job:** $TRAINING_JOB_ID
## Training Metrics
- **Final Accuracy:** 94.0%
- **Final Loss:** 0.042
- **Training Epochs:** 100
- **Convergence:** Epoch 87
## Validation Metrics
- **Validation Accuracy:** 92.0%
- **Precision:** 0.91
- **Recall:** 0.90
- **F1 Score:** 0.905
- **Train-Val Gap:** 2.0% (acceptable)
## Performance Benchmarks
- **Inference Latency:** 67ms (p50)
- **Throughput:** 1,200 QPS
- **Memory Usage:** 512 MB
- **Model Size:** 128 MB
## Recommendations
✓ Model meets all production requirements
✓ Ready for deployment
- Consider quantization for edge deployment
- Monitor for distribution drift in production
EOF
# Post-edit hook
npx claude-flow@alpha hooks post-edit --file "neural/reports/performance-report.md" --memory-key "neural/report"
Success Criteria:
Memory Persistence:
npx claude-flow@alpha memory store \
--key "neural/phase4-complete" \
--value "{\"status\": \"complete\", \"validation_passed\": true, \"accuracy\": 0.94, \"latency_ms\": 67, \"timestamp\": \"$(date -Iseconds)\"}"
Objective: Deploy validated model to production, setup monitoring, enable scaling
Evidence-Based Validation:
ml-developer Actions:
# Create model metadata
cat > neural/models/model-metadata.json << 'EOF'
{
"model": {
"id": "$TRAINING_JOB_ID",
"name": "transformer-classifier-v1",
"version": "1.0.0",
"architecture": "transformer",
"framework": "tensorflow",
"created": "$(date -Iseconds)"
},
"performance": {
"accuracy": 0.94,
"latency_ms": 67,
"throughput_qps": 1200,
"memory_mb": 512
},
"requirements": {
"min_memory_mb": 768,
"min_cpu_cores": 2,
"gpu_required": false
},
"inputs": {
"type": "tensor",
"shape": [1, 512],
"dtype": "float32"
},
"outputs": {
"type": "probabilities",
"shape": [1, 10],
"dtype": "float32"
}
}
EOF
# Post-task hook
npx claude-flow@alpha hooks post-task --task-id "neural-training"
flow-nexus-neural Actions:
# Publish model as template (optional)
mcp__flow-nexus__neural_publish_template {
"model_id": "$TRAINING_JOB_ID",
"name": "Transformer Classifier v1",
"description": "6-layer transformer encoder for multi-class classification",
"user_id": "current_user",
"category": "classification",
"price": 0
}
# Get cluster status for documentation
mcp__flow-nexus__neural_cluster_status {
"cluster_id": "$CLUSTER_ID"
}
# Store deployment info
npx claude-flow@alpha memory store \
--key "neural/deployment-ready" \
--value "{\"model_id\": \"$TRAINING_JOB_ID\", \"template_published\": true, \"timestamp\": \"$(date -Iseconds)\"}"
cicd-engineer Actions:
# Create Docker deployment configuration
cat > neural/Dockerfile << 'EOF'
FROM tensorflow/tensorflow:latest
WORKDIR /app
# Copy model files
COPY neural/models/ /app/models/
COPY neural/configs/ /app/configs/
COPY neural/scripts/ /app/scripts/
# Install dependencies
RUN pip install --no-cache-dir \
fastapi \
uvicorn \
prometheus-client \
python-json-logger
# Create serving script
COPY neural/scripts/serve.py /app/serve.py
EXPOSE 8000
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
# Create model serving API
cat > neural/scripts/serve.py << 'EOF'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import json
import time
from prometheus_client import Counter, Histogram, generate_latest
app = FastAPI(title="Neural Network Inference API")
# Metrics
inference_counter = Counter('inference_requests_total', 'Total inference requests')
inference_latency = Histogram('inference_latency_seconds', 'Inference latency')
class InferenceRequest(BaseModel):
features: list
model_version: str = "1.0.0"
class InferenceResponse(BaseModel):
predictions: list
confidence: float
latency_ms: float
@app.get("/health")
def health_check():
return {"status": "healthy", "model": "transformer-classifier-v1"}
@app.post("/predict", response_model=InferenceResponse)
def predict(request: InferenceRequest):
start_time = time.time()
inference_counter.inc()
# Mock inference (replace with actual model)
predictions = [0.1] * 10
confidence = max(predictions)
latency = (time.time() - start_time) * 1000
inference_latency.observe(latency / 1000)
return InferenceResponse(
predictions=predictions,
confidence=confidence,
latency_ms=latency
)
@app.get("/metrics")
def metrics():
return generate_latest()
EOF
# Create deployment script
cat > neural/scripts/deploy.sh << 'EOF'
#!/bin/bash
set -e
echo "Building Docker image..."
docker build -t neural-classifier:v1 -f neural/Dockerfile .
echo "Running container..."
docker run -d \
--name neural-classifier \
-p 8000:8000 \
--memory="1g" \
--cpus="2" \
neural-classifier:v1
echo "Waiting for service to be ready..."
sleep 5
echo "Testing health endpoint..."
curl http://localhost:8000/health
echo "Deployment complete!"
EOF
chmod +x neural/scripts/deploy.sh
# Post-edit hooks
npx claude-flow@alpha hooks post-edit --file "neural/Dockerfile" --memory-key "neural/dockerfile"
npx claude-flow@alpha hooks post-edit --file "neural/scripts/serve.py" --memory-key "neural/serve-api"
npx claude-flow@alpha hooks post-edit --file "neural/scripts/deploy.sh" --memory-key "neural/deploy-script"
# Create deployment documentation
cat > neural/docs/DEPLOYMENT.md << 'EOF'
# Deployment Guide
## Prerequisites
- Docker installed
- Port 8000 available
- Minimum 1GB RAM, 2 CPU cores
## Quick Start
1. Build and deploy:
```bash
./neural/scripts/deploy.sh
Test inference:
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"features": [0.1, 0.2, ...]}'
Check metrics:
curl http://localhost:8000/metrics
docker logs neural-classifierFor production deployment, consider:
npx claude-flow@alpha hooks post-edit --file "neural/docs/DEPLOYMENT.md" --memory-key "neural/deployment-docs"
npx claude-flow@alpha hooks session-end --export-metrics true
**Success Criteria:**
- [ ] Model packaged for deployment
- [ ] Docker configuration created
- [ ] Serving API implemented
- [ ] Deployment scripts tested
- [ ] Documentation completed
**Memory Persistence:**
```bash
npx claude-flow@alpha memory store \
--key "neural/phase5-complete" \
--value "{\"status\": \"complete\", \"deployment\": \"docker\", \"api\": \"fastapi\", \"port\": 8000, \"timestamp\": \"$(date -Iseconds)\"}"
# Final workflow summary
npx claude-flow@alpha memory store \
--key "neural/workflow-complete" \
--value "{\"status\": \"success\", \"model_accuracy\": 0.94, \"latency_ms\": 67, \"deployed\": true, \"timestamp\": \"$(date -Iseconds)\"}"
Total Estimated Duration: 45-90 minutes
Phase Breakdown:
Key Deliverables:
Training Quality:
Performance:
Deployment:
Authentication Issues:
# Re-authenticate with Flow Nexus
mcp__flow-nexus__user_login {
"email": "[email protected]",
"password": "secure_pass"
}
Training Not Converging:
Resource Limitations:
Deployment Failures:
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.