AWS Cost Optimization & FinOps

Systematic workflows for AWS cost optimization and financial operations management.

When to Use This Skill

Use this skill when you need to:

Find cost savings: Identify unused resources, rightsizing opportunities, or commitment discounts
Analyze spending: Understand cost trends, detect anomalies, or break down costs
Optimize architecture: Choose cost-effective services, storage tiers, or instance types
Implement FinOps: Set up governance, tagging, budgets, or monthly reviews
Make purchase decisions: Evaluate Reserved Instances, Savings Plans, or Spot instances
Troubleshoot costs: Investigate unexpected bills or cost spikes
Plan budgets: Forecast costs or evaluate impact of new projects

Cost Optimization Workflow

Follow this systematic approach for AWS cost optimization:

┌─────────────────────────────────────────────┐
│ 1. DISCOVER                                 │
│    What are we spending money on?           │
│    Run: find_unused_resources.py            │
│    Run: cost_anomaly_detector.py            │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 2. ANALYZE                                  │
│    Where are the optimization opportunities?│
│    Run: rightsizing_analyzer.py             │
│    Run: detect_old_generations.py           │
│    Run: spot_recommendations.py             │
│    Run: analyze_ri_recommendations.py       │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 3. PRIORITIZE                               │
│    What should we optimize first?           │
│    - Quick wins (low risk, high savings)    │
│    - Low-hanging fruit (easy to implement)  │
│    - Strategic improvements                 │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 4. IMPLEMENT                                │
│    Execute optimization actions             │
│    - Delete unused resources                │
│    - Rightsize instances                    │
│    - Purchase commitments                   │
│    - Migrate to new generations             │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 5. MONITOR                                  │
│    Verify savings and track metrics         │
│    - Monthly cost reviews                   │
│    - Tag compliance monitoring              │
│    - Budget variance tracking               │
└─────────────────────────────────────────────┘

Core Workflows

Workflow 1: Monthly Cost Optimization Review

Frequency: Run monthly (first week of each month)

Step 1: Find Unused Resources

# Scan for waste across all resources
python3 scripts/find_unused_resources.py

# Expected output:
# - Unattached EBS volumes
# - Old snapshots
# - Unused Elastic IPs
# - Idle NAT Gateways
# - Idle EC2 instances
# - Unused load balancers
# - Estimated monthly savings

Step 2: Analyze Cost Anomalies

# Detect unusual spending patterns
python3 scripts/cost_anomaly_detector.py --days 30

# Expected output:
# - Cost spikes and anomalies
# - Top cost drivers
# - Period-over-period comparison
# - 30-day forecast

Step 3: Identify Rightsizing Opportunities

# Find oversized instances
python3 scripts/rightsizing_analyzer.py --days 30

# Expected output:
# - EC2 instances with low utilization
# - RDS instances with low utilization
# - Recommended smaller instance types
# - Estimated savings

Step 4: Generate Monthly Report

# Use the template to compile findings
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md

# Fill in:
# - Findings from scripts
# - Action items
# - Team cost breakdowns
# - Optimization wins

Step 5: Team Review Meeting

Present findings to engineering teams
Assign optimization tasks
Track action items to completion

Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)

When: Quarterly or when usage patterns stabilize

Step 1: Analyze Current Usage

# Identify workloads suitable for commitments
python3 scripts/analyze_ri_recommendations.py --days 60

# Looks for:
# - EC2 instances running consistently for 60+ days
# - RDS instances with stable usage
# - Calculates ROI for 1yr vs 3yr commitments

Step 2: Review Recommendations

Evaluate each recommendation:

✅ Good candidate if:
  - Running 24/7 for 60+ days
  - Workload is stable and predictable
  - No plans to change architecture
  - Savings > 30%

❌ Poor candidate if:
  - Workload is variable or experimental
  - Architecture changes planned
  - Instance type may change
  - Dev/test environment

Step 3: Choose Commitment Type

Reserved Instances:

Standard RI: Highest discount (63%), no flexibility
Convertible RI: Moderate discount (54%), can change instance type
Best for: Specific instance types, stable workloads

Savings Plans:

Compute SP: Flexible across instance types, regions (66% savings)
EC2 Instance SP: Flexible across sizes in same family (72% savings)
Best for: Variable workloads within constraints

Decision Matrix:

Known instance type, won't change → Standard RI
May need to change types → Convertible RI or Compute SP
Variable workloads → Compute Savings Plan
Maximum flexibility → Compute Savings Plan

Step 4: Purchase and Track

Purchase through AWS Console or CLI
Tag commitments with purchase date and owner
Monitor utilization monthly
Aim for >90% utilization

Reference: See references/best_practices.md for detailed commitment strategies

Workflow 3: Instance Generation Migration

When: During architecture reviews or optimization sprints

Step 1: Detect Old Instances

# Find outdated instance generations
python3 scripts/detect_old_generations.py

# Identifies:
# - t2 → t3 migrations (10% savings)
# - m4 → m5 → m6i migrations
# - Intel → Graviton opportunities (20% savings)

Step 2: Prioritize Migrations

Quick Wins (Low Risk):

t2 → t3: Drop-in replacement, 10% savings
m4 → m5: Better performance, 5% savings
gp2 → gp3: No downtime, 20% savings

Medium Effort (Test Required):

x86 → Graviton (ARM64): 20% savings
- Requires ARM64 compatibility testing
- Most modern frameworks support ARM64
- Test in staging first

Step 3: Execute Migration

For EC2 (x86 to x86):

Stop instance
Change instance type
Start instance
Verify application

For Graviton Migration:

Create ARM64 AMI or Docker image
Launch new Graviton instance
Test thoroughly
Cut over traffic
Terminate old instance

Step 4: Validate Savings

Monitor new costs in Cost Explorer
Verify performance is acceptable
Document migration for other teams

Reference: See references/best_practices.md → Compute Optimization

Workflow 4: Spot Instance Evaluation

When: For fault-tolerant workloads or Auto Scaling Groups

Step 1: Identify Candidates

# Analyze workloads for Spot suitability
python3 scripts/spot_recommendations.py

# Evaluates:
# - Instances in Auto Scaling Groups (good candidates)
# - Dev/test/staging environments
# - Batch processing workloads
# - CI/CD and build servers

Step 2: Assess Suitability

Excellent for Spot:

Stateless applications
Batch jobs
CI/CD pipelines
Data processing
Auto Scaling Groups

NOT suitable for Spot:

Databases (without replicas)
Stateful applications
Real-time services
Mission-critical workloads

Step 3: Implementation Strategy

Option 1: Fargate Spot (Easiest)

# ECS task definition
requiresCompatibilities:
  - FARGATE
capacityProviderStrategy:
  - capacityProvider: FARGATE_SPOT
    weight: 70  # 70% Spot
  - capacityProvider: FARGATE
    weight: 30  # 30% On-Demand

Option 2: EC2 Auto Scaling with Spot

# Mixed instances policy
MixedInstancesPolicy:
  InstancesDistribution:
    OnDemandBaseCapacity: 2
    OnDemandPercentageAboveBaseCapacity: 30
    SpotAllocationStrategy: capacity-optimized
  LaunchTemplate:
    Overrides:
      - InstanceType: m5.large
      - InstanceType: m5a.large
      - InstanceType: m5n.large

Option 3: EC2 Spot Fleet

# Create Spot Fleet with diverse instance types
aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json

Step 4: Implement Interruption Handling

# Handle 2-minute termination notice
# Instance metadata: /latest/meta-data/spot/instance-action

# In application:
1. Poll for termination notice
2. Gracefully shutdown (save state)
3. Drain connections
4. Exit

Reference: See references/best_practices.md → Compute Optimization → Spot Instances

Quick Reference: Cost Optimization Scripts

All Scripts Location

ls scripts/
# find_unused_resources.py
# analyze_ri_recommendations.py
# detect_old_generations.py
# spot_recommendations.py
# rightsizing_analyzer.py
# cost_anomaly_detector.py

Script Usage Patterns

Monthly Review (Run all):

python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30

Quarterly Optimization:

python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.py

Specific Region Only:

python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2

Named AWS Profile:

python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60

Script Requirements

# Install dependencies
pip install boto3 tabulate

# AWS credentials required
# Configure via: aws configure
# Or use: --profile PROFILE_NAME

Service-Specific Optimization

Compute Optimization

Key Actions:

Migrate to Graviton (20% savings)
Use Spot for fault-tolerant workloads (70% savings)
Purchase RIs for stable workloads (40-65% savings)
Right-size oversized instances

Reference: references/best_practices.md → Compute Optimization

Storage Optimization

Key Actions:

Convert gp2 → gp3 (20% savings)
Implement S3 lifecycle policies (50-95% savings)
Delete old snapshots
Use S3 Intelligent-Tiering

Reference: references/best_practices.md → Storage Optimization

Network Optimization

Key Actions:

Replace NAT Gateways with VPC Endpoints (save $25-30/month each)
Use CloudFront to reduce data transfer costs
Colocate resources in same AZ when possible

Reference: references/best_practices.md → Network Optimization

Database Optimization

Key Actions:

Right-size RDS instances
Use gp3 storage (20% cheaper than gp2)
Evaluate Aurora Serverless for variable workloads
Purchase RDS Reserved Instances

Reference: references/best_practices.md → Database Optimization

Service Alternatives Decision Guide

Need help choosing between services?

Question: "Should I use EC2, Lambda, or Fargate?" Answer: See references/service_alternatives.md → Compute Alternatives

Question: "Which S3 storage class should I use?" Answer: See references/service_alternatives.md → Storage Alternatives

Question: "Should I use RDS or Aurora?" Answer: See references/service_alternatives.md → Database Alternatives

Question: "NAT Gateway vs VPC Endpoint vs NAT Instance?" Answer: See references/service_alternatives.md → Networking Alternatives

FinOps Governance & Process

Setting Up FinOps

Phase 1: Foundation (Month 1)

Enable Cost Explorer
Set up AWS Budgets
Define tagging strategy
Activate cost allocation tags

Phase 2: Visibility (Months 2-3)

Implement tagging enforcement
Run optimization scripts
Set up monthly reviews
Create team cost reports

Phase 3: Culture (Ongoing)

Cost metrics in engineering KPIs
Cost review in architecture decisions
Regular optimization sprints
FinOps champions in each team

Full Guide: See references/finops_governance.md

Monthly Review Process

Week 1: Data Collection

Run all optimization scripts
Export Cost & Usage Reports
Compile findings

Week 2: Analysis

Identify trends
Find opportunities
Prioritize actions

Week 3: Team Reviews

Present to engineering teams
Discuss optimizations
Assign action items

Week 4: Executive Reporting

Create executive summary
Forecast next quarter
Report optimization wins

Template: See assets/templates/monthly_cost_report.md

Detailed Process: See references/finops_governance.md → Monthly Review Process

Cost Optimization Checklist

Quick Wins (Do First)

[ ] Delete unattached EBS volumes
[ ] Delete old EBS snapshots (>90 days)
[ ] Release unused Elastic IPs
[ ] Convert gp2 → gp3 volumes
[ ] Stop/terminate idle EC2 instances
[ ] Enable S3 Intelligent-Tiering
[ ] Set up AWS Budgets and alerts

Medium Effort (This Quarter)

[ ] Right-size oversized instances
[ ] Migrate to newer instance generations
[ ] Purchase Reserved Instances for stable workloads
[ ] Implement S3 lifecycle policies
[ ] Replace NAT Gateways with VPC Endpoints (where applicable)
[ ] Enable automated resource scheduling (dev/test)
[ ] Implement tagging strategy and enforcement

Strategic Initiatives (Ongoing)

[ ] Migrate to Graviton instances
[ ] Implement Spot for fault-tolerant workloads
[ ] Establish monthly cost review process
[ ] Set up cost allocation by team
[ ] Implement chargeback/showback model
[ ] Create FinOps culture and practices

Troubleshooting Cost Issues

"My bill suddenly increased"

Run cost anomaly detection:

python3 scripts/cost_anomaly_detector.py --days 30

Check Cost Explorer for service breakdown
Review CloudTrail for resource creation events
Check for AutoScaling events
Verify no Reserved Instances expired

"I need to reduce costs by X%"

Follow the optimization workflow:

Run all discovery scripts
Calculate total potential savings
Prioritize by: Savings Amount × (1 / Effort)
Focus on quick wins first
Implement strategic changes for long-term

"How do I know if Reserved Instances make sense?"

Run RI analysis:

python3 scripts/analyze_ri_recommendations.py --days 60

Look for:

Instances running 60+ days consistently
Workloads that won't change
Savings > 30%

"Which resources can I safely delete?"

Run unused resource finder:

python3 scripts/find_unused_resources.py

Safe to delete (usually):

Unattached EBS volumes (after verifying)
Snapshots > 90 days (if backups exist elsewhere)
Unused Elastic IPs (after verifying not in DNS)
Stopped EC2 instances > 30 days (after confirming abandoned)

Always verify with resource owner before deletion!

Best Practices Summary

Tag Everything: Consistent tagging enables cost allocation and accountability
Monitor Continuously: Weekly script runs catch waste early
Review Monthly: Regular reviews prevent cost drift
Right-size Proactively: Don't wait for cost issues to optimize
Use Commitments Wisely: RIs/SPs for stable workloads only
Test Before Migrating: Especially for Graviton or Spot
Automate Cleanup: Scheduled shutdown of dev/test resources
Share Wins: Celebrate cost savings to build FinOps culture

Additional Resources

Detailed References:

references/best_practices.md: Comprehensive optimization strategies
references/service_alternatives.md: Cost-effective service selection
references/finops_governance.md: Organizational FinOps practices

Templates:

assets/templates/monthly_cost_report.md: Monthly reporting template

Scripts:

All scripts in scripts/ directory with --help for usage

AWS Documentation:

AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
FinOps Foundation: https://www.finops.org

AWS Cost Optimization & FinOps

Systematic workflows for AWS cost optimization and financial operations management.

When to Use This Skill

Use this skill when you need to:

Find cost savings: Identify unused resources, rightsizing opportunities, or commitment discounts
Analyze spending: Understand cost trends, detect anomalies, or break down costs
Optimize architecture: Choose cost-effective services, storage tiers, or instance types
Implement FinOps: Set up governance, tagging, budgets, or monthly reviews
Make purchase decisions: Evaluate Reserved Instances, Savings Plans, or Spot instances
Troubleshoot costs: Investigate unexpected bills or cost spikes
Plan budgets: Forecast costs or evaluate impact of new projects

Cost Optimization Workflow

Follow this systematic approach for AWS cost optimization:

┌─────────────────────────────────────────────┐
│ 1. DISCOVER                                 │
│    What are we spending money on?           │
│    Run: find_unused_resources.py            │
│    Run: cost_anomaly_detector.py            │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 2. ANALYZE                                  │
│    Where are the optimization opportunities?│
│    Run: rightsizing_analyzer.py             │
│    Run: detect_old_generations.py           │
│    Run: spot_recommendations.py             │
│    Run: analyze_ri_recommendations.py       │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 3. PRIORITIZE                               │
│    What should we optimize first?           │
│    - Quick wins (low risk, high savings)    │
│    - Low-hanging fruit (easy to implement)  │
│    - Strategic improvements                 │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 4. IMPLEMENT                                │
│    Execute optimization actions             │
│    - Delete unused resources                │
│    - Rightsize instances                    │
│    - Purchase commitments                   │
│    - Migrate to new generations             │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 5. MONITOR                                  │
│    Verify savings and track metrics         │
│    - Monthly cost reviews                   │
│    - Tag compliance monitoring              │
│    - Budget variance tracking               │
└─────────────────────────────────────────────┘

Core Workflows

Workflow 1: Monthly Cost Optimization Review

Frequency: Run monthly (first week of each month)

Step 1: Find Unused Resources

# Scan for waste across all resources
python3 scripts/find_unused_resources.py

# Expected output:
# - Unattached EBS volumes
# - Old snapshots
# - Unused Elastic IPs
# - Idle NAT Gateways
# - Idle EC2 instances
# - Unused load balancers
# - Estimated monthly savings

Step 2: Analyze Cost Anomalies

# Detect unusual spending patterns
python3 scripts/cost_anomaly_detector.py --days 30

# Expected output:
# - Cost spikes and anomalies
# - Top cost drivers
# - Period-over-period comparison
# - 30-day forecast

Step 3: Identify Rightsizing Opportunities

# Find oversized instances
python3 scripts/rightsizing_analyzer.py --days 30

# Expected output:
# - EC2 instances with low utilization
# - RDS instances with low utilization
# - Recommended smaller instance types
# - Estimated savings

Step 4: Generate Monthly Report

# Use the template to compile findings
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md

# Fill in:
# - Findings from scripts
# - Action items
# - Team cost breakdowns
# - Optimization wins

Step 5: Team Review Meeting

Present findings to engineering teams
Assign optimization tasks
Track action items to completion

Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)

When: Quarterly or when usage patterns stabilize

Step 1: Analyze Current Usage

# Identify workloads suitable for commitments
python3 scripts/analyze_ri_recommendations.py --days 60

# Looks for:
# - EC2 instances running consistently for 60+ days
# - RDS instances with stable usage
# - Calculates ROI for 1yr vs 3yr commitments

Step 2: Review Recommendations

Evaluate each recommendation:

✅ Good candidate if:
  - Running 24/7 for 60+ days
  - Workload is stable and predictable
  - No plans to change architecture
  - Savings > 30%

❌ Poor candidate if:
  - Workload is variable or experimental
  - Architecture changes planned
  - Instance type may change
  - Dev/test environment

Step 3: Choose Commitment Type

Reserved Instances:

Standard RI: Highest discount (63%), no flexibility
Convertible RI: Moderate discount (54%), can change instance type
Best for: Specific instance types, stable workloads

Savings Plans:

Compute SP: Flexible across instance types, regions (66% savings)
EC2 Instance SP: Flexible across sizes in same family (72% savings)
Best for: Variable workloads within constraints

Decision Matrix:

Known instance type, won't change → Standard RI
May need to change types → Convertible RI or Compute SP
Variable workloads → Compute Savings Plan
Maximum flexibility → Compute Savings Plan

Step 4: Purchase and Track

Purchase through AWS Console or CLI
Tag commitments with purchase date and owner
Monitor utilization monthly
Aim for >90% utilization

Reference: See references/best_practices.md for detailed commitment strategies

Workflow 3: Instance Generation Migration

When: During architecture reviews or optimization sprints

Step 1: Detect Old Instances

# Find outdated instance generations
python3 scripts/detect_old_generations.py

# Identifies:
# - t2 → t3 migrations (10% savings)
# - m4 → m5 → m6i migrations
# - Intel → Graviton opportunities (20% savings)

Step 2: Prioritize Migrations

Quick Wins (Low Risk):

t2 → t3: Drop-in replacement, 10% savings
m4 → m5: Better performance, 5% savings
gp2 → gp3: No downtime, 20% savings

Medium Effort (Test Required):

x86 → Graviton (ARM64): 20% savings
- Requires ARM64 compatibility testing
- Most modern frameworks support ARM64
- Test in staging first

Step 3: Execute Migration

For EC2 (x86 to x86):

Stop instance
Change instance type
Start instance
Verify application

For Graviton Migration:

Create ARM64 AMI or Docker image
Launch new Graviton instance
Test thoroughly
Cut over traffic
Terminate old instance

Step 4: Validate Savings

Monitor new costs in Cost Explorer
Verify performance is acceptable
Document migration for other teams

Reference: See references/best_practices.md → Compute Optimization

Workflow 4: Spot Instance Evaluation

When: For fault-tolerant workloads or Auto Scaling Groups

Step 1: Identify Candidates

# Analyze workloads for Spot suitability
python3 scripts/spot_recommendations.py

# Evaluates:
# - Instances in Auto Scaling Groups (good candidates)
# - Dev/test/staging environments
# - Batch processing workloads
# - CI/CD and build servers

Step 2: Assess Suitability

Excellent for Spot:

Stateless applications
Batch jobs
CI/CD pipelines
Data processing
Auto Scaling Groups

NOT suitable for Spot:

Databases (without replicas)
Stateful applications
Real-time services
Mission-critical workloads

Step 3: Implementation Strategy

Option 1: Fargate Spot (Easiest)

# ECS task definition
requiresCompatibilities:
  - FARGATE
capacityProviderStrategy:
  - capacityProvider: FARGATE_SPOT
    weight: 70  # 70% Spot
  - capacityProvider: FARGATE
    weight: 30  # 30% On-Demand

Option 2: EC2 Auto Scaling with Spot

# Mixed instances policy
MixedInstancesPolicy:
  InstancesDistribution:
    OnDemandBaseCapacity: 2
    OnDemandPercentageAboveBaseCapacity: 30
    SpotAllocationStrategy: capacity-optimized
  LaunchTemplate:
    Overrides:
      - InstanceType: m5.large
      - InstanceType: m5a.large
      - InstanceType: m5n.large

Option 3: EC2 Spot Fleet

# Create Spot Fleet with diverse instance types
aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json

Step 4: Implement Interruption Handling

# Handle 2-minute termination notice
# Instance metadata: /latest/meta-data/spot/instance-action

# In application:
1. Poll for termination notice
2. Gracefully shutdown (save state)
3. Drain connections
4. Exit

Reference: See references/best_practices.md → Compute Optimization → Spot Instances

Quick Reference: Cost Optimization Scripts

All Scripts Location

ls scripts/
# find_unused_resources.py
# analyze_ri_recommendations.py
# detect_old_generations.py
# spot_recommendations.py
# rightsizing_analyzer.py
# cost_anomaly_detector.py

Script Usage Patterns

Monthly Review (Run all):

python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30

Quarterly Optimization:

python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.py

Specific Region Only:

python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2

Named AWS Profile:

python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60

Script Requirements

# Install dependencies
pip install boto3 tabulate

# AWS credentials required
# Configure via: aws configure
# Or use: --profile PROFILE_NAME

Service-Specific Optimization

Compute Optimization

Key Actions:

Migrate to Graviton (20% savings)
Use Spot for fault-tolerant workloads (70% savings)
Purchase RIs for stable workloads (40-65% savings)
Right-size oversized instances

Reference: references/best_practices.md → Compute Optimization

Storage Optimization

Key Actions:

Convert gp2 → gp3 (20% savings)
Implement S3 lifecycle policies (50-95% savings)
Delete old snapshots
Use S3 Intelligent-Tiering

Reference: references/best_practices.md → Storage Optimization

Network Optimization

Key Actions:

Replace NAT Gateways with VPC Endpoints (save $25-30/month each)
Use CloudFront to reduce data transfer costs
Colocate resources in same AZ when possible

Reference: references/best_practices.md → Network Optimization

Database Optimization

Key Actions:

Right-size RDS instances
Use gp3 storage (20% cheaper than gp2)
Evaluate Aurora Serverless for variable workloads
Purchase RDS Reserved Instances

Reference: references/best_practices.md → Database Optimization

Service Alternatives Decision Guide

Need help choosing between services?

Question: "Should I use EC2, Lambda, or Fargate?" Answer: See references/service_alternatives.md → Compute Alternatives

Question: "Which S3 storage class should I use?" Answer: See references/service_alternatives.md → Storage Alternatives

Question: "Should I use RDS or Aurora?" Answer: See references/service_alternatives.md → Database Alternatives

Question: "NAT Gateway vs VPC Endpoint vs NAT Instance?" Answer: See references/service_alternatives.md → Networking Alternatives

FinOps Governance & Process

Setting Up FinOps

Phase 1: Foundation (Month 1)

Enable Cost Explorer
Set up AWS Budgets
Define tagging strategy
Activate cost allocation tags

Phase 2: Visibility (Months 2-3)

Implement tagging enforcement
Run optimization scripts
Set up monthly reviews
Create team cost reports

Phase 3: Culture (Ongoing)

Cost metrics in engineering KPIs
Cost review in architecture decisions
Regular optimization sprints
FinOps champions in each team

Full Guide: See references/finops_governance.md

Monthly Review Process

Week 1: Data Collection

Run all optimization scripts
Export Cost & Usage Reports
Compile findings

Week 2: Analysis

Identify trends
Find opportunities
Prioritize actions

Week 3: Team Reviews

Present to engineering teams
Discuss optimizations
Assign action items

Week 4: Executive Reporting

Create executive summary
Forecast next quarter
Report optimization wins

Template: See assets/templates/monthly_cost_report.md

Detailed Process: See references/finops_governance.md → Monthly Review Process

Cost Optimization Checklist

Quick Wins (Do First)

[ ] Delete unattached EBS volumes
[ ] Delete old EBS snapshots (>90 days)
[ ] Release unused Elastic IPs
[ ] Convert gp2 → gp3 volumes
[ ] Stop/terminate idle EC2 instances
[ ] Enable S3 Intelligent-Tiering
[ ] Set up AWS Budgets and alerts

Medium Effort (This Quarter)

[ ] Right-size oversized instances
[ ] Migrate to newer instance generations
[ ] Purchase Reserved Instances for stable workloads
[ ] Implement S3 lifecycle policies
[ ] Replace NAT Gateways with VPC Endpoints (where applicable)
[ ] Enable automated resource scheduling (dev/test)
[ ] Implement tagging strategy and enforcement

Strategic Initiatives (Ongoing)

[ ] Migrate to Graviton instances
[ ] Implement Spot for fault-tolerant workloads
[ ] Establish monthly cost review process
[ ] Set up cost allocation by team
[ ] Implement chargeback/showback model
[ ] Create FinOps culture and practices

Troubleshooting Cost Issues

"My bill suddenly increased"

Run cost anomaly detection:

python3 scripts/cost_anomaly_detector.py --days 30

Check Cost Explorer for service breakdown
Review CloudTrail for resource creation events
Check for AutoScaling events
Verify no Reserved Instances expired

"I need to reduce costs by X%"

Follow the optimization workflow:

Run all discovery scripts
Calculate total potential savings
Prioritize by: Savings Amount × (1 / Effort)
Focus on quick wins first
Implement strategic changes for long-term

"How do I know if Reserved Instances make sense?"

Run RI analysis:

python3 scripts/analyze_ri_recommendations.py --days 60

Look for:

Instances running 60+ days consistently
Workloads that won't change
Savings > 30%

"Which resources can I safely delete?"

Run unused resource finder:

python3 scripts/find_unused_resources.py

Safe to delete (usually):

Unattached EBS volumes (after verifying)
Snapshots > 90 days (if backups exist elsewhere)
Unused Elastic IPs (after verifying not in DNS)
Stopped EC2 instances > 30 days (after confirming abandoned)

Always verify with resource owner before deletion!

Best Practices Summary

Tag Everything: Consistent tagging enables cost allocation and accountability
Monitor Continuously: Weekly script runs catch waste early
Review Monthly: Regular reviews prevent cost drift
Right-size Proactively: Don't wait for cost issues to optimize
Use Commitments Wisely: RIs/SPs for stable workloads only
Test Before Migrating: Especially for Graviton or Spot
Automate Cleanup: Scheduled shutdown of dev/test resources
Share Wins: Celebrate cost savings to build FinOps culture

Additional Resources

Detailed References:

references/best_practices.md: Comprehensive optimization strategies
references/service_alternatives.md: Cost-effective service selection
references/finops_governance.md: Organizational FinOps practices

Templates:

assets/templates/monthly_cost_report.md: Monthly reporting template

Scripts:

All scripts in scripts/ directory with --help for usage

AWS Documentation:

AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
FinOps Foundation: https://www.finops.org

Adoption

ahmedasmar/aws-cost-finops

$ install --global

Security Scan Results

SKILL.md

AWS Cost Optimization & FinOps

When to Use This Skill

Cost Optimization Workflow

Core Workflows

Workflow 1: Monthly Cost Optimization Review

Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)

Workflow 3: Instance Generation Migration

Workflow 4: Spot Instance Evaluation

Quick Reference: Cost Optimization Scripts

All Scripts Location

Script Usage Patterns

Script Requirements

Service-Specific Optimization

Compute Optimization

Storage Optimization

Network Optimization

Database Optimization

Service Alternatives Decision Guide

FinOps Governance & Process

Setting Up FinOps

Monthly Review Process

Cost Optimization Checklist

Quick Wins (Do First)

Medium Effort (This Quarter)

Strategic Initiatives (Ongoing)

Troubleshooting Cost Issues

"My bill suddenly increased"

"I need to reduce costs by X%"

"How do I know if Reserved Instances make sense?"

"Which resources can I safely delete?"

Best Practices Summary

Additional Resources

Related Skills

ahmedasmar/monitoring-observability

ahmedasmar/k8s-troubleshooter

ahmedasmar/iac-terraform

ahmedasmar/gitops-workflows

ahmedasmar/aws-cost-finops

$ install --global

Security Scan Results

SKILL.md

AWS Cost Optimization & FinOps

When to Use This Skill

Cost Optimization Workflow

Core Workflows

Workflow 1: Monthly Cost Optimization Review

Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)

Workflow 3: Instance Generation Migration

Workflow 4: Spot Instance Evaluation

Quick Reference: Cost Optimization Scripts

All Scripts Location

Script Usage Patterns

Script Requirements

Service-Specific Optimization

Compute Optimization

Storage Optimization

Network Optimization

Database Optimization

Service Alternatives Decision Guide

FinOps Governance & Process

Setting Up FinOps

Monthly Review Process

Cost Optimization Checklist

Quick Wins (Do First)

Medium Effort (This Quarter)

Strategic Initiatives (Ongoing)

Troubleshooting Cost Issues

"My bill suddenly increased"

"I need to reduce costs by X%"

"How do I know if Reserved Instances make sense?"

"Which resources can I safely delete?"

Best Practices Summary

Additional Resources

Related Skills

ahmedasmar/monitoring-observability