skills/technology-mastery/cicd-devops-expertise/SKILL.md
# CI/CD and DevOps Toolchain Expertise ## Description Expert-level knowledge of CI/CD platforms, DevOps toolchains, and automation frameworks including Jenkins, GitLab CI, GitHub Actions, Azure DevOps, CircleCI, infrastructure as code tools (Terraform, Pulumi), configuration management (Ansible, Chef, Puppet), and deployment strategies with optimization techniques and best practices. ## When to Use - Designing and implementing CI/CD pipelines for complex applications - Optimizing build and dep
npx skillsauth add pauljbernard/headelf skills/technology-mastery/cicd-devops-expertiseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert-level knowledge of CI/CD platforms, DevOps toolchains, and automation frameworks including Jenkins, GitLab CI, GitHub Actions, Azure DevOps, CircleCI, infrastructure as code tools (Terraform, Pulumi), configuration management (Ansible, Chef, Puppet), and deployment strategies with optimization techniques and best practices.
You are a CI/CD and DevOps expert with deep knowledge of automation platforms, infrastructure as code, deployment strategies, and optimization techniques for software delivery pipelines.
Jenkins Architecture Components:
Jenkins Master/Controller:
├── Web Interface: User interface and REST API
├── Build Scheduler: Job queue management and execution coordination
├── Plugin Manager: Plugin installation and lifecycle management
├── Security Realm: Authentication and authorization
├── Node Management: Agent registration and communication
├── Configuration Storage: Job and system configuration persistence
├── Build History: Build artifacts and log storage
└── Workspace Management: Build workspace allocation and cleanup
Jenkins Agent/Node Architecture:
├── Agent Connection:
│ ├── JNLP (Java Web Start): Agent-initiated connection
│ ├── SSH: SSH-based agent connection
│ ├── Docker: Container-based ephemeral agents
│ ├── Kubernetes: Pod-based dynamic agent provisioning
│ └── Cloud Agents: AWS, Azure, GCP auto-scaling agents
├── Executor Management:
│ ├── Concurrent Builds: Multiple executor configuration
│ ├── Resource Allocation: CPU and memory per executor
│ ├── Label-Based Routing: Job assignment based on agent capabilities
│ ├── Load Balancing: Workload distribution across agents
│ └── Queue Management: Build queue optimization
Pipeline Architecture:
├── Pipeline DSL: Groovy-based pipeline definition
├── Stage Execution: Sequential and parallel stage execution
├── Shared Libraries: Reusable pipeline code
├── Pipeline Scripts: Declarative and scripted pipelines
├── Blue Ocean: Modern pipeline visualization
└── Pipeline Optimization: Performance and resource optimization
class JenkinsOptimizer:
def __init__(self):
self.performance_analyzer = JenkinsPerformanceAnalyzer()
self.plugin_manager = JenkinsPluginOptimizer()
self.pipeline_optimizer = PipelineOptimizer()
def optimize_jenkins_deployment(self, jenkins_requirements):
"""Optimize Jenkins for performance and scalability"""
# Master configuration optimization
master_config = {
'system_properties': {
'hudson.model.DirectoryBrowserSupport.CSP': '',
'jenkins.model.Jenkins.slaveAgentPort': 50000,
'jenkins.model.Jenkins.slaveAgentPortEnforce': True,
'hudson.security.csrf.GlobalCrumbIssuerConfiguration.DISABLE_CSRF_PROTECTION': False,
'hudson.model.ParametersAction.keepUndefinedParameters': False
},
'jvm_options': [
f'-Xms{jenkins_requirements.heap_min}',
f'-Xmx{jenkins_requirements.heap_max}',
'-XX:+UseG1GC',
'-XX:+UseStringDeduplication',
'-XX:+DisableExplicitGC',
'-XX:+ParallelRefProcEnabled',
f'-Djava.awt.headless=true'
],
'security': {
'csrf_protection': True,
'agent_to_master_security': True,
'matrix_authorization': self.generate_authorization_matrix(jenkins_requirements.users),
'script_security': True
}
}
# Agent configuration optimization
agent_config = {
'kubernetes_agents': {
'enabled': True,
'namespace': 'jenkins',
'pod_template': {
'yaml': self.generate_optimal_pod_template(jenkins_requirements),
'idle_minutes': 5,
'max_concurrent_builds': jenkins_requirements.max_concurrent_builds
}
},
'static_agents': self.configure_static_agents(jenkins_requirements.static_agents),
'cloud_agents': {
'aws_ec2': self.configure_aws_agents(jenkins_requirements.aws_config),
'retention_strategy': 'demand',
'instance_cap': jenkins_requirements.max_cloud_instances
}
}
# Pipeline optimization
pipeline_optimizations = {
'shared_libraries': self.design_shared_library_structure(),
'pipeline_templates': self.generate_pipeline_templates(),
'parallel_execution': self.optimize_parallel_stages(),
'caching_strategy': self.design_build_caching(),
'artifact_management': self.optimize_artifact_handling()
}
# Plugin optimization
essential_plugins = [
'workflow-aggregator', # Pipeline plugin suite
'kubernetes', # Kubernetes agents
'docker-workflow', # Docker pipeline steps
'pipeline-stage-view', # Pipeline visualization
'blueocean', # Modern UI
'github-branch-source', # GitHub integration
'slack', # Notifications
'build-timeout', # Build timeouts
'timestamper', # Build timestamps
'ws-cleanup' # Workspace cleanup
]
return JenkinsOptimization(
master_config=master_config,
agent_config=agent_config,
pipeline_optimizations=pipeline_optimizations,
recommended_plugins=essential_plugins,
monitoring_setup=self.design_jenkins_monitoring()
)
def design_enterprise_jenkins_architecture(self, enterprise_requirements):
"""Design enterprise-grade Jenkins architecture"""
architecture = {
'high_availability': {
'active_passive': {
'shared_storage': 'NFS or AWS EFS',
'database': 'PostgreSQL with replication',
'load_balancer': 'HAProxy or AWS ALB',
'failover_time': '< 5 minutes'
},
'multi_master': {
'federation': 'Jenkins federation plugin',
'load_distribution': 'Geographic or team-based',
'shared_agents': 'Cross-master agent sharing',
'unified_dashboard': 'Blue Ocean aggregated view'
}
},
'scalability': {
'horizontal_scaling': {
'master_per_team': True,
'shared_agent_pool': True,
'centralized_monitoring': True,
'unified_authentication': 'LDAP/SAML SSO'
},
'agent_scaling': {
'kubernetes_autoscaling': True,
'cloud_bursting': 'AWS/Azure auto-scaling',
'spot_instances': 'Cost optimization',
'resource_quotas': 'Team-based limits'
}
},
'security': {
'network_security': {
'master_isolation': 'Private subnet deployment',
'agent_communication': 'Encrypted JNLP/SSH',
'external_access': 'VPN or bastion host',
'firewall_rules': 'Minimal required ports'
},
'access_control': {
'rbac': 'Role-based access control',
'project_isolation': 'Folder-based permissions',
'secret_management': 'Credentials plugin + external vault',
'audit_logging': 'Comprehensive audit trail'
}
}
}
return EnterpriseJenkinsArchitecture(
architecture_design=architecture,
deployment_strategy=self.design_jenkins_deployment_strategy(),
backup_strategy=self.design_jenkins_backup_strategy(),
disaster_recovery=self.design_jenkins_dr_strategy()
)
Advanced Jenkins Patterns:
Pipeline as Code Best Practices:
├── Declarative Pipelines: Structured, readable pipeline definitions
├── Shared Libraries: Reusable pipeline components and functions
├── Parameter Validation: Input validation and type checking
├── Error Handling: Comprehensive error handling and recovery
├── Parallel Execution: Optimized parallel stage execution
├── Matrix Builds: Multi-configuration build execution
├── Pipeline Visualization: Clear stage progression and status
└── Integration Testing: Automated pipeline testing
Blue-Green Deployment Pipeline:
├── Environment Provisioning: Automated infrastructure deployment
├── Application Deployment: Zero-downtime deployment strategy
├── Health Checks: Comprehensive application health validation
├── Traffic Switching: Load balancer configuration updates
├── Rollback Capability: Automated rollback on failure detection
├── Monitoring Integration: Deployment success monitoring
└── Cleanup: Automated cleanup of old environments
Multi-Branch Pipeline Optimization:
├── Branch Strategy: Git flow and feature branch handling
├── PR/MR Validation: Automated pull request validation
├── Branch Indexing: Efficient branch discovery and management
├── Resource Optimization: Selective build triggering
├── Artifact Promotion: Cross-branch artifact promotion
├── Webhook Integration: Real-time build triggering
└── Cleanup Policies: Automated branch cleanup
GitLab CI/CD Architecture:
GitLab Runner Architecture:
├── Runner Types:
│ ├── Shared Runners: Multi-project runner sharing
│ ├── Group Runners: Group-level runner assignment
│ ├── Project Runners: Project-specific runners
│ └── Instance Runners: GitLab.com hosted runners
├── Executor Types:
│ ├── Docker Executor: Container-based job execution
│ ├── Kubernetes Executor: Pod-based dynamic execution
│ ├── Shell Executor: Direct shell command execution
│ ├── SSH Executor: Remote SSH-based execution
│ └── VirtualBox/Parallels: VM-based isolation
├── Pipeline Architecture:
│ ├── Pipeline Configuration: .gitlab-ci.yml definition
│ ├── Stage Execution: Sequential stage processing
│ ├── Job Parallelization: Concurrent job execution
│ ├── Dependency Management: Job and stage dependencies
│ ├── Artifact Handling: Build artifact management
│ └── Cache Management: Build cache optimization
class GitLabOptimizer:
def __init__(self):
self.runner_optimizer = GitLabRunnerOptimizer()
self.pipeline_analyzer = GitLabPipelineAnalyzer()
self.cache_optimizer = GitLabCacheOptimizer()
def optimize_gitlab_runners(self, runner_requirements):
"""Optimize GitLab Runners for performance and cost"""
# Docker executor optimization
docker_config = {
'executor': 'docker',
'docker': {
'image': 'docker:20.10.16',
'privileged': True,
'disable_entrypoint_overwrite': False,
'oom_kill_disable': False,
'disable_cache': False,
'cache_dir': '/cache',
'volumes': [
'/var/run/docker.sock:/var/run/docker.sock',
'/cache',
'/builds:/builds'
],
'shm_size': 268435456, # 256MB
'pull_policy': ['if-not-present', 'always'],
'allowed_images': runner_requirements.allowed_images,
'allowed_services': runner_requirements.allowed_services
},
'cache': {
'type': 's3',
'shared': True,
's3': {
'server_address': runner_requirements.cache_s3_endpoint,
'bucket_name': runner_requirements.cache_bucket,
'bucket_location': runner_requirements.cache_region,
'insecure': False
}
}
}
# Kubernetes executor optimization
kubernetes_config = {
'executor': 'kubernetes',
'kubernetes': {
'host': runner_requirements.k8s_api_server,
'namespace': 'gitlab-runner',
'privileged': True,
'cpu_limit': runner_requirements.cpu_limit,
'memory_limit': runner_requirements.memory_limit,
'service_cpu_limit': runner_requirements.service_cpu_limit,
'service_memory_limit': runner_requirements.service_memory_limit,
'helper_cpu_limit': '100m',
'helper_memory_limit': '128Mi',
'poll_interval': 3,
'poll_timeout': 180,
'pod_annotations': {
'cluster-autoscaler.kubernetes.io/safe-to-evict': 'true'
},
'node_selector': runner_requirements.node_selector,
'image_pull_secrets': ['gitlab-registry']
}
}
# Auto-scaling configuration
autoscaling_config = {
'concurrent': runner_requirements.max_concurrent_jobs,
'check_interval': 3,
'session_timeout': 1800,
'machine': {
'idle_count': 1,
'idle_time': 300,
'max_builds': 10,
'machine_driver': 'amazonec2',
'machine_name': 'gitlab-runner-%s',
'machine_options': [
f'amazonec2-instance-type={runner_requirements.instance_type}',
f'amazonec2-region={runner_requirements.region}',
f'amazonec2-vpc-id={runner_requirements.vpc_id}',
f'amazonec2-subnet-id={runner_requirements.subnet_id}',
f'amazonec2-security-group={runner_requirements.security_group}',
'amazonec2-use-private-address=true',
'amazonec2-request-spot-instance=true',
f'amazonec2-spot-price={runner_requirements.spot_price}'
]
}
}
return GitLabRunnerOptimization(
docker_config=docker_config,
kubernetes_config=kubernetes_config,
autoscaling_config=autoscaling_config,
monitoring_setup=self.design_runner_monitoring()
)
def optimize_pipeline_performance(self, pipeline_requirements):
"""Optimize GitLab CI pipeline for speed and efficiency"""
# Pipeline structure optimization
pipeline_optimizations = {
'stages': self.optimize_stage_structure(pipeline_requirements.pipeline_complexity),
'parallel_jobs': {
'test_parallelization': {
'parallel': min(10, pipeline_requirements.test_suite_size // 100),
'matrix_strategy': self.design_test_matrix(pipeline_requirements.test_environments)
},
'build_parallelization': {
'parallel': {
'matrix': pipeline_requirements.build_matrix
}
}
},
'job_optimization': {
'image_optimization': {
'base_images': self.recommend_base_images(pipeline_requirements.tech_stack),
'layer_caching': True,
'multi_stage_builds': True
},
'script_optimization': {
'early_exit': 'set -e',
'verbose_output': 'set -x',
'parallel_commands': True
}
}
}
# Cache optimization strategy
cache_strategy = {
'global_cache': {
'key': '${CI_COMMIT_REF_SLUG}',
'paths': ['node_modules/', '.gradle/', 'target/', 'vendor/'],
'policy': 'pull-push'
},
'job_specific_cache': {
'build_cache': {
'key': '${CI_COMMIT_SHA}',
'paths': ['dist/', 'build/'],
'policy': 'push'
},
'test_cache': {
'key': '${CI_COMMIT_REF_SLUG}',
'paths': ['.pytest_cache/', 'coverage/'],
'policy': 'pull'
}
},
'cache_fallback': {
'key': 'default-cache',
'policy': 'pull'
}
}
# Artifact optimization
artifact_strategy = {
'selective_artifacts': {
'build_artifacts': {
'paths': ['dist/', 'target/*.jar', '*.war'],
'expire_in': '1 week',
'when': 'on_success'
},
'test_artifacts': {
'paths': ['coverage/', 'test-results/'],
'reports': {
'junit': 'test-results/*.xml',
'coverage_report': {
'coverage_format': 'cobertura',
'path': 'coverage/cobertura.xml'
}
},
'expire_in': '1 day'
}
}
}
return GitLabPipelineOptimization(
pipeline_structure=pipeline_optimizations,
cache_strategy=cache_strategy,
artifact_strategy=artifact_strategy,
performance_metrics=self.design_pipeline_metrics()
)
GitLab CI/CD Advanced Patterns:
Multi-Project Pipelines:
├── Upstream/Downstream: Cross-project pipeline triggering
├── Dependency Management: Project dependency orchestration
├── Artifact Passing: Cross-project artifact sharing
├── Pipeline Visualization: Multi-project pipeline graphs
├── Security Scanning: Centralized security scanning
└── Deployment Coordination: Multi-service deployment orchestration
Auto DevOps Integration:
├── Auto Build: Automatic build detection and execution
├── Auto Test: Automated testing based on project type
├── Auto Code Quality: Code quality scanning and reporting
├── Auto SAST/DAST: Security testing automation
├── Auto Deploy: Kubernetes-based automated deployment
├── Auto Monitoring: Prometheus-based application monitoring
└── Auto Review Apps: Dynamic environment provisioning
Terraform Expertise:
Terraform Architecture and Best Practices:
├── State Management:
│ ├── Remote State: S3, Terraform Cloud, Azure Storage backends
│ ├── State Locking: DynamoDB, Azure Storage Account locking
│ ├── State Encryption: Encryption at rest and in transit
│ ├── State Versioning: State backup and versioning
│ └── Workspace Management: Environment-specific workspaces
├── Module Design:
│ ├── Module Structure: Standardized module organization
│ ├── Input Validation: Variable validation and type checking
│ ├── Output Design: Comprehensive output definitions
│ ├── Module Versioning: Semantic versioning for modules
│ ├── Module Registry: Private and public module registries
│ └── Module Testing: Automated module testing with Terratest
├── Provider Optimization:
│ ├── Provider Versions: Pinned provider versions
│ ├── Multiple Providers: Multi-cloud and multi-region providers
│ ├── Provider Configuration: Optimal provider settings
│ ├── Authentication: Secure credential management
│ └── Rate Limiting: API rate limit handling
class TerraformOptimizer:
def __init__(self):
self.state_analyzer = TerraformStateAnalyzer()
self.performance_optimizer = TerraformPerformanceOptimizer()
self.security_validator = TerraformSecurityValidator()
def optimize_terraform_architecture(self, infrastructure_requirements):
"""Optimize Terraform for large-scale infrastructure management"""
# State management optimization
state_config = {
'backend': {
'type': 's3',
'config': {
'bucket': infrastructure_requirements.state_bucket,
'key': '${var.environment}/${var.component}/terraform.tfstate',
'region': infrastructure_requirements.primary_region,
'encrypt': True,
'dynamodb_table': infrastructure_requirements.lock_table,
'versioning': True
}
},
'workspace_strategy': {
'pattern': 'environment-based',
'workspaces': ['dev', 'staging', 'prod'],
'isolation_level': 'complete'
}
}
# Module architecture
module_architecture = {
'foundational_modules': {
'networking': {
'vpc_module': self.design_vpc_module(),
'security_group_module': self.design_security_group_module(),
'load_balancer_module': self.design_load_balancer_module()
},
'compute': {
'ec2_module': self.design_ec2_module(),
'autoscaling_module': self.design_autoscaling_module(),
'ecs_module': self.design_ecs_module(),
'eks_module': self.design_eks_module()
},
'storage': {
'rds_module': self.design_rds_module(),
'elasticache_module': self.design_elasticache_module(),
's3_module': self.design_s3_module()
}
},
'composition_modules': {
'application_stack': self.design_application_stack_module(),
'monitoring_stack': self.design_monitoring_stack_module(),
'security_stack': self.design_security_stack_module()
}
}
# Performance optimizations
performance_config = {
'parallelism': min(20, infrastructure_requirements.resource_count // 50),
'provider_optimization': {
'aws': {
'max_retries': 3,
'skip_credentials_validation': False,
'skip_get_ec2_platforms': True,
'skip_region_validation': False,
'skip_requesting_account_id': False
}
},
'plan_optimization': {
'refresh': True,
'target_specific_resources': infrastructure_requirements.incremental_deployments,
'destroy_plan': infrastructure_requirements.enable_destroy_plans
}
}
# Security and compliance
security_config = {
'policy_as_code': {
'opa_policies': self.generate_opa_policies(),
'sentinel_policies': self.generate_sentinel_policies(),
'checkov_rules': self.generate_checkov_rules()
},
'secret_management': {
'provider': 'aws_ssm' or 'vault' or 'azure_keyvault',
'encryption_key': infrastructure_requirements.encryption_key,
'access_control': 'iam_roles'
},
'compliance_scanning': {
'tools': ['tfsec', 'checkov', 'terrascan'],
'automated_scanning': True,
'compliance_frameworks': ['CIS', 'PCI-DSS', 'SOC2']
}
}
return TerraformOptimization(
state_config=state_config,
module_architecture=module_architecture,
performance_config=performance_config,
security_config=security_config,
ci_cd_integration=self.design_terraform_cicd()
)
def design_multi_environment_strategy(self, environment_requirements):
"""Design multi-environment Terraform strategy"""
strategy = {
'directory_structure': {
'pattern': 'environment-separated',
'structure': {
'environments/': {
'dev/': 'Development environment configuration',
'staging/': 'Staging environment configuration',
'prod/': 'Production environment configuration'
},
'modules/': {
'networking/': 'Networking module definitions',
'compute/': 'Compute module definitions',
'storage/': 'Storage module definitions'
},
'shared/': 'Shared configuration and policies'
}
},
'variable_management': {
'tfvars_files': {
'common.tfvars': 'Shared variables across environments',
'environment-specific.tfvars': 'Environment-specific overrides'
},
'variable_hierarchy': [
'terraform.tfvars',
'environment.tfvars',
'TF_VAR_* environment variables',
'CLI arguments'
]
},
'deployment_strategy': {
'promotion_pipeline': {
'dev_to_staging': 'Automated promotion on successful tests',
'staging_to_prod': 'Manual approval with comprehensive validation'
},
'rollback_strategy': {
'state_backup': 'Pre-deployment state backup',
'blue_green': 'Blue-green infrastructure deployment'
}
}
}
return MultiEnvironmentStrategy(
strategy=strategy,
automation_scripts=self.generate_automation_scripts(),
monitoring_setup=self.design_terraform_monitoring()
)
Terraform Advanced Patterns:
Dynamic Configuration:
├── Data Sources: External data integration and lookups
├── Local Values: Computed local variables and expressions
├── Conditional Resources: Resource creation based on conditions
├── For Expressions: List and map transformations
├── Dynamic Blocks: Dynamic resource configuration
└── Template Functions: String interpolation and formatting
State Management Patterns:
├── State Splitting: Logical separation of infrastructure components
├── State Import: Importing existing infrastructure
├── State Migration: Moving resources between states
├── Resource Targeting: Selective resource operations
├── State Refresh: Synchronizing state with real infrastructure
└── State Recovery: Disaster recovery procedures
Multi-Cloud Strategies:
├── Provider Aliasing: Multiple provider configurations
├── Cross-Cloud Dependencies: Resource dependencies across clouds
├── Cloud-Agnostic Modules: Portable infrastructure modules
├── Cost Optimization: Multi-cloud cost comparison
├── Disaster Recovery: Cross-cloud failover strategies
└── Compliance: Multi-cloud security and compliance
This CI/CD and DevOps expertise provides HeadElf with comprehensive knowledge of all major CI/CD platforms, infrastructure as code tools, and automation strategies for modern software delivery.
tools
# Security Tools and Frameworks Expertise ## Description Expert-level knowledge of cybersecurity tools, frameworks, and platforms including SIEM systems, vulnerability scanners, penetration testing tools, security orchestration platforms, identity and access management systems, and security automation frameworks with implementation strategies and optimization techniques. ## When to Use - Designing comprehensive security architectures for enterprise systems - Implementing security automation an
tools
# Monitoring and Observability Tools Expertise ## Description Expert-level knowledge of monitoring, observability, and APM (Application Performance Monitoring) tools including Prometheus, Grafana, Jaeger, OpenTelemetry, Elasticsearch, Datadog, New Relic, and cloud-native observability platforms with internal architectures, optimization techniques, and implementation strategies. ## When to Use - Designing comprehensive observability strategies for distributed systems - Implementing monitoring s
tools
# Machine Learning and AI Frameworks Expertise ## Description Expert-level knowledge of machine learning and AI frameworks including TensorFlow, PyTorch, Scikit-learn, Hugging Face, MLflow, Kubeflow, Apache Spark ML, cloud ML platforms, and MLOps tools with optimization techniques, deployment strategies, and production implementation patterns. ## When to Use - Designing and implementing machine learning pipelines and infrastructure - Selecting optimal ML frameworks for specific use cases and r
development
# Message Queue and Streaming Technology Expertise ## Description Expert-level knowledge of message queue systems, event streaming platforms, and asynchronous communication architectures including internal implementations, optimization techniques, failure scenarios, and selection criteria. ## When to Use - Designing high-throughput, low-latency messaging systems - Implementing event-driven architectures and microservices communication - Building real-time data streaming and processing pipeline