Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

frank-luongt/plugins/faos-data-scientist/skills/ml-pipeline-workflow

Name: plugins/faos-data-scientist/skills/ml-pipeline-workflow
Author: frank-luongt

plugins/faos-data-scientist/skills/ml-pipeline-workflow/SKILL.md

npx skillsauth add frank-luongt/faos-skills-marketplace plugins/faos-data-scientist/skills/ml-pipeline-workflow

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

name: ml-pipeline-workflow description: Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows. tags: [ml, pipelines, workflows]

ML Pipeline Workflow

Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.

Do not use this skill when

The task is unrelated to ml pipeline workflow
You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.

Overview

This skill provides comprehensive guidance for building production ML pipelines that handle the full lifecycle: data ingestion → preparation → training → validation → deployment → monitoring.

Use this skill when

Building new ML pipelines from scratch
Designing workflow orchestration for ML systems
Implementing data → model → deployment automation
Setting up reproducible training workflows
Creating DAG-based ML orchestration
Integrating ML components into production systems

What This Skill Provides

Core Capabilities

Pipeline Architecture
- End-to-end workflow design
- DAG orchestration patterns (Airflow, Dagster, Kubeflow)
- Component dependencies and data flow
- Error handling and retry strategies
Data Preparation
- Data validation and quality checks
- Feature engineering pipelines
- Data versioning and lineage
- Train/validation/test splitting strategies
Model Training
- Training job orchestration
- Hyperparameter management
- Experiment tracking integration
- Distributed training patterns
Model Validation
- Validation frameworks and metrics
- A/B testing infrastructure
- Performance regression detection
- Model comparison workflows
Deployment Automation
- Model serving patterns
- Canary deployments
- Blue-green deployment strategies
- Rollback mechanisms

Reference Documentation

See the references/ directory for detailed guides:

data-preparation.md - Data cleaning, validation, and feature engineering
model-training.md - Training workflows and best practices
model-validation.md - Validation strategies and metrics
model-deployment.md - Deployment patterns and serving architectures

Assets and Templates

The assets/ directory contains:

pipeline-dag.yaml.template - DAG template for workflow orchestration
training-config.yaml - Training configuration template
validation-checklist.md - Pre-deployment validation checklist

Usage Patterns

Basic Pipeline Setup

# 1. Define pipeline stages
stages = [
    "data_ingestion",
    "data_validation",
    "feature_engineering",
    "model_training",
    "model_validation",
    "model_deployment"
]

# 2. Configure dependencies
# See assets/pipeline-dag.yaml.template for full example

Production Workflow

Data Preparation Phase
- Ingest raw data from sources
- Run data quality checks
- Apply feature transformations
- Version processed datasets
Training Phase
- Load versioned training data
- Execute training jobs
- Track experiments and metrics
- Save trained models
Validation Phase
- Run validation test suite
- Compare against baseline
- Generate performance reports
- Approve for deployment
Deployment Phase
- Package model artifacts
- Deploy to serving infrastructure
- Configure monitoring
- Validate production traffic

Best Practices

Pipeline Design

Modularity: Each stage should be independently testable
Idempotency: Re-running stages should be safe
Observability: Log metrics at every stage
Versioning: Track data, code, and model versions
Failure Handling: Implement retry logic and alerting

Data Management

Use data validation libraries (Great Expectations, TFX)
Version datasets with DVC or similar tools
Document feature engineering transformations
Maintain data lineage tracking

Model Operations

Separate training and serving infrastructure
Use model registries (MLflow, Weights & Biases)
Implement gradual rollouts for new models
Monitor model performance drift
Maintain rollback capabilities

Deployment Strategies

Start with shadow deployments
Use canary releases for validation
Implement A/B testing infrastructure
Set up automated rollback triggers
Monitor latency and throughput

Integration Points

Orchestration Tools

Apache Airflow: DAG-based workflow orchestration
Dagster: Asset-based pipeline orchestration
Kubeflow Pipelines: Kubernetes-native ML workflows
Prefect: Modern dataflow automation

Experiment Tracking

MLflow for experiment tracking and model registry
Weights & Biases for visualization and collaboration
TensorBoard for training metrics

Deployment Platforms

AWS SageMaker for managed ML infrastructure
Google Vertex AI for GCP deployments
Azure ML for Azure cloud
Kubernetes + KServe for cloud-agnostic serving

Progressive Disclosure

Start with the basics and gradually add complexity:

Level 1: Simple linear pipeline (data → train → deploy)
Level 2: Add validation and monitoring stages
Level 3: Implement hyperparameter tuning
Level 4: Add A/B testing and gradual rollouts
Level 5: Multi-model pipelines with ensemble strategies

Common Patterns

Batch Training Pipeline

# See assets/pipeline-dag.yaml.template
stages:
  - name: data_preparation
    dependencies: []
  - name: model_training
    dependencies: [data_preparation]
  - name: model_evaluation
    dependencies: [model_training]
  - name: model_deployment
    dependencies: [model_evaluation]

Real-time Feature Pipeline

# Stream processing for real-time features
# Combined with batch training
# See references/data-preparation.md

Continuous Training

# Automated retraining on schedule
# Triggered by data drift detection
# See references/model-training.md

Troubleshooting

Common Issues

Pipeline failures: Check dependencies and data availability
Training instability: Review hyperparameters and data quality
Deployment issues: Validate model artifacts and serving config
Performance degradation: Monitor data drift and model metrics

Debugging Steps

Check pipeline logs for each stage
Validate input/output data at boundaries
Test components in isolation
Review experiment tracking metrics
Inspect model artifacts and metadata

Next Steps

After setting up your pipeline:

Explore hyperparameter-tuning skill for optimization
Learn experiment-tracking-setup for MLflow/W&B
Review model-deployment-patterns for serving strategies
Implement monitoring with observability tools

Related Skills

experiment-tracking-setup: MLflow and Weights & Biases integration
hyperparameter-tuning: Automated hyperparameter optimization
model-deployment-patterns: Advanced deployment strategies

frank-luongt/plugins/faos-data-scientist/skills/ml-pipeline-workflow

plugins/faos-data-scientist/skills/ml-pipeline-workflow/SKILL.md

--- name: ml-pipeline-workflow description: Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows. tags: [ml, pipelines, workflows] --- # ML Pipeline Workflow Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment. ##

12 stars

tools

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add frank-luongt/faos-skills-marketplace plugins/faos-data-scientist/skills/ml-pipeline-workflow

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 2:58 PM9.8s1 file scanned

SKILL.md

name: ml-pipeline-workflow description: Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows. tags: [ml, pipelines, workflows]

ML Pipeline Workflow

Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.

Do not use this skill when

The task is unrelated to ml pipeline workflow
You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.

Overview

Use this skill when

Building new ML pipelines from scratch
Designing workflow orchestration for ML systems
Implementing data → model → deployment automation
Setting up reproducible training workflows
Creating DAG-based ML orchestration
Integrating ML components into production systems

What This Skill Provides

Core Capabilities

Pipeline Architecture
- End-to-end workflow design
- DAG orchestration patterns (Airflow, Dagster, Kubeflow)
- Component dependencies and data flow
- Error handling and retry strategies
Data Preparation
- Data validation and quality checks
- Feature engineering pipelines
- Data versioning and lineage
- Train/validation/test splitting strategies
Model Training
- Training job orchestration
- Hyperparameter management
- Experiment tracking integration
- Distributed training patterns
Model Validation
- Validation frameworks and metrics
- A/B testing infrastructure
- Performance regression detection
- Model comparison workflows
Deployment Automation
- Model serving patterns
- Canary deployments
- Blue-green deployment strategies
- Rollback mechanisms

Reference Documentation

See the references/ directory for detailed guides:

data-preparation.md - Data cleaning, validation, and feature engineering
model-training.md - Training workflows and best practices
model-validation.md - Validation strategies and metrics
model-deployment.md - Deployment patterns and serving architectures

Assets and Templates

The assets/ directory contains:

pipeline-dag.yaml.template - DAG template for workflow orchestration
training-config.yaml - Training configuration template
validation-checklist.md - Pre-deployment validation checklist

Usage Patterns

Basic Pipeline Setup

# 1. Define pipeline stages
stages = [
    "data_ingestion",
    "data_validation",
    "feature_engineering",
    "model_training",
    "model_validation",
    "model_deployment"
]

# 2. Configure dependencies
# See assets/pipeline-dag.yaml.template for full example

Production Workflow

Data Preparation Phase
- Ingest raw data from sources
- Run data quality checks
- Apply feature transformations
- Version processed datasets
Training Phase
- Load versioned training data
- Execute training jobs
- Track experiments and metrics
- Save trained models
Validation Phase
- Run validation test suite
- Compare against baseline
- Generate performance reports
- Approve for deployment
Deployment Phase
- Package model artifacts
- Deploy to serving infrastructure
- Configure monitoring
- Validate production traffic

Best Practices

Pipeline Design

Modularity: Each stage should be independently testable
Idempotency: Re-running stages should be safe
Observability: Log metrics at every stage
Versioning: Track data, code, and model versions
Failure Handling: Implement retry logic and alerting

Data Management

Use data validation libraries (Great Expectations, TFX)
Version datasets with DVC or similar tools
Document feature engineering transformations
Maintain data lineage tracking

Model Operations

Separate training and serving infrastructure
Use model registries (MLflow, Weights & Biases)
Implement gradual rollouts for new models
Monitor model performance drift
Maintain rollback capabilities

Deployment Strategies

Start with shadow deployments
Use canary releases for validation
Implement A/B testing infrastructure
Set up automated rollback triggers
Monitor latency and throughput

Integration Points

Orchestration Tools

Apache Airflow: DAG-based workflow orchestration
Dagster: Asset-based pipeline orchestration
Kubeflow Pipelines: Kubernetes-native ML workflows
Prefect: Modern dataflow automation

Experiment Tracking

MLflow for experiment tracking and model registry
Weights & Biases for visualization and collaboration
TensorBoard for training metrics

Deployment Platforms

AWS SageMaker for managed ML infrastructure
Google Vertex AI for GCP deployments
Azure ML for Azure cloud
Kubernetes + KServe for cloud-agnostic serving

Progressive Disclosure

Start with the basics and gradually add complexity:

Level 1: Simple linear pipeline (data → train → deploy)
Level 2: Add validation and monitoring stages
Level 3: Implement hyperparameter tuning
Level 4: Add A/B testing and gradual rollouts
Level 5: Multi-model pipelines with ensemble strategies

Common Patterns

Batch Training Pipeline

# See assets/pipeline-dag.yaml.template
stages:
  - name: data_preparation
    dependencies: []
  - name: model_training
    dependencies: [data_preparation]
  - name: model_evaluation
    dependencies: [model_training]
  - name: model_deployment
    dependencies: [model_evaluation]

Real-time Feature Pipeline

# Stream processing for real-time features
# Combined with batch training
# See references/data-preparation.md

Continuous Training

# Automated retraining on schedule
# Triggered by data drift detection
# See references/model-training.md

Troubleshooting

Common Issues

Pipeline failures: Check dependencies and data availability
Training instability: Review hyperparameters and data quality
Deployment issues: Validate model artifacts and serving config
Performance degradation: Monitor data drift and model metrics

Debugging Steps

Check pipeline logs for each stage
Validate input/output data at boundaries
Test components in isolation
Review experiment tracking metrics
Inspect model artifacts and metadata

Next Steps

After setting up your pipeline:

Explore hyperparameter-tuning skill for optimization
Learn experiment-tracking-setup for MLflow/W&B
Review model-deployment-patterns for serving strategies
Implement monitoring with observability tools

Related Skills

experiment-tracking-setup: MLflow and Weights & Biases integration
hyperparameter-tuning: Automated hyperparameter optimization
model-deployment-patterns: Advanced deployment strategies

Related Skills

frank-luongt/skills/codex/grpo-rl-training

development

VerifiedTrustedCommunity

--- name: grpo-rl-training description: GRPO reinforcement learning training with TRL. Use when applying Group Relative Policy Optimization for reasoning and task-specific model training. --- # GRPO/RL Training with TRL Expert-level guidance for implementing Group Relative Policy Optimization (GRPO) using the Transformer Reinforcement Learning (TRL) library. This skill provides battle-tested patterns, critical insights, and production-r

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/grpo-rl-training

frank-luongt/skills/codex/graphql-architect

tools

VerifiedTrustedCommunity

--- name: graphql-architect description: Master modern GraphQL with federation, performance optimization, --- ## Use this skill when - Working on graphql architect tasks or workflows - Needing guidance, best practices, or checklists for graphql architect ## Do not use this skill when - The task is unrelated to graphql architect - You need a different domain or tool outside this scope ## Instructions - Clarify goals, constraints, and

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/graphql-architect

frank-luongt/skills/codex/grafana-dashboards

development

VerifiedTrustedCommunity

--- name: grafana-dashboards description: Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces. --- # Grafana Dashboards Create and manage production-ready Grafana dashboards for comprehensive system observability. ## Do not use this skill when - The task is unrelated

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/grafana-dashboards

frank-luongt/skills/codex/gptq

development

VerifiedTrustedCommunity

--- name: gptq description: GPTQ post-training quantization for generative models. Use when quantizing large models to 4-bit with calibration-based weight compression. --- # GPTQ (Generative Pre-trained Transformer Quantization) Post-training quantization method that compresses LLMs to 4-bit with minimal accuracy loss using group-wise quantization. ## When to use GPTQ **Use GPTQ when:** - Need to fit large models (70B+) on limited GPU

26SKILL.mdUpdated Jul 9, 2026

frank-luongt/skills/codex/gptq

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/frank-luongt/faos-skills-marketplace.git

# Copy into Claude Code skills folder (global)
cp -r faos-skills-marketplace/plugins/faos-data-scientist/skills/ml-pipeline-workflow ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

frank-luongt/faos-skills-marketplace

12 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT