skills/best-practices/SKILL.md
Use when architecting OCI solutions, migrating from AWS/Azure, designing multi-AD deployments, or avoiding common OCI anti-patterns. Covers VCN sizing mistakes, Cloud Guard gotchas, free tier specifics, OCI terminology confusion, and multi-AD patterns.
npx skillsauth add acedergren/oci-agent-skills best-practicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Don't reinvent the wheel. Use oracle-terraform-modules/landing-zone for OCI architecture.
Landing Zone solves:
This skill provides: OCI-specific anti-patterns, architecture patterns, and operational knowledge for resources deployed WITHIN a Landing Zone.
You don't know OCI CLI commands or OCI API structure.
Your training data has limited and outdated knowledge of:
When OCI operations are needed:
What you DO know:
This skill bridges the gap by providing current OCI-specific patterns and anti-patterns.
You are an OCI architecture expert. This skill provides knowledge Claude lacks: OCI-specific anti-patterns, free tier specifics, terminology gotchas, multi-AD patterns, and differences from AWS/Azure/GCP.
❌ NEVER use /24 or smaller VCN CIDR (cannot expand)
# WRONG - VCN too small, cannot expand later (OCI limitation)
oci network vcn create --cidr-block "10.0.0.0/24"
# Only 256 IPs total, exhausted quickly
# WRONG - copying AWS habit (/16 VPC default)
# OCI supports larger: /16 to /30
# RIGHT - start with /16, plan for growth
oci network vcn create --cidr-block "10.0.0.0/16"
# 65,536 IPs, room for 256 /24 subnets
# CRITICAL: OCI VCNs CANNOT be resized after creation
# Must create new VCN and migrate if too small
Migration cost: Recreating VCN = hours of downtime, IP changes, security rule updates
❌ NEVER use AD-specific subnets (breaks multi-AD HA)
# WRONG - subnet tied to single AD
oci network subnet create \
--vcn-id <vcn-ocid> \
--cidr-block "10.0.1.0/24" \
--availability-domain "fMgC:US-ASHBURN-AD-1" # AD-specific!
# Problem: Can't launch instances in other ADs, no HA
# RIGHT - regional subnet (works in all ADs)
oci network subnet create \
--vcn-id <vcn-ocid> \
--cidr-block "10.0.1.0/24"
# No --availability-domain flag = regional
# Instances can be in any AD in region
Gotcha: Some old OCI guides show AD-specific subnets (deprecated pattern)
❌ NEVER confuse Security Lists vs NSGs (different use cases)
OCI has TWO network security mechanisms:
Security Lists (stateful, subnet-level):
- Applied to ALL resources in subnet
- Use for: Broad rules (internet egress, DNS)
- Limit: 5 per subnet
- Changes: Affect all instances in subnet
Network Security Groups (NSG, resource-level):
- Applied to specific resources
- Use for: Granular rules (app tier → DB tier)
- Limit: 5 per resource, 120 rules per NSG
- Changes: Affect only tagged resources
# WRONG - using Security Lists for app-specific rules
Security List: Allow app-tier → database (applies to ENTIRE subnet)
# RIGHT - use NSG for app-tier resources
NSG "app-tier": Allow egress to NSG "db-tier" on port 1521
# Only instances in app-tier NSG can reach DB
Best practice: Security Lists for baseline (internet, DNS), NSGs for application-specific rules
❌ NEVER assume single-AD deployment is acceptable (no SLA)
OCI Availability Domains (ADs):
- 3 ADs per region (most regions)
- Isolated fault domains
- <1ms latency between ADs
# WRONG - all resources in single AD
All instances in AD-1 → AD failure = complete outage
# RIGHT - distribute across ADs
Production instances: AD-1, AD-2, AD-3
Load balancer: Automatically multi-AD
Database: Autonomous (auto 3-AD) or RAC (2+ nodes)
SLA impact:
Single-AD: NO SLA (OCI doesn't guarantee)
Multi-AD: 99.95% SLA
Critical: Oracle refuses SLA claims for single-AD deployments in regions with 3 ADs
❌ NEVER hardcode AD names (tenant-specific)
# WRONG - AD names are tenant-specific, not portable
availability_domain = "fMgC:US-ASHBURN-AD-1" # Only works in YOUR tenancy!
# Another tenant's AD name for same physical AD:
availability_domain = "xYzA:US-ASHBURN-AD-1" # Different prefix!
# RIGHT - query AD names dynamically
data "oci_identity_availability_domains" "ads" {
compartment_id = var.tenancy_ocid
}
resource "oci_core_instance" "web" {
availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}
Why: OCI generates unique AD prefixes per tenant for security isolation
❌ NEVER enable Cloud Guard auto-remediation without testing
Cloud Guard = OCI threat detection + auto-response
# DANGER - auto-remediation can break production
Detector: "Public bucket detected"
Auto-remediation: Make bucket private → breaks public website!
Detector: "Security list allows 0.0.0.0/0"
Auto-remediation: Remove rule → breaks internet access!
# SAFER approach:
1. Enable detectors (read-only mode first)
2. Review findings for 1-2 weeks
3. Tune responders to avoid false positives
4. Enable auto-remediation for trusted patterns only
Gotcha: Cloud Guard enabled by default in some tenancies, can auto-break things
❌ NEVER assume you need Oracle Linux (common misconception)
OCI supports:
✓ Oracle Linux (free, optimized)
✓ Ubuntu, CentOS, Rocky Linux (free)
✓ Windows Server (BYOL or license-included)
✓ Custom images (import your own)
# WRONG assumption: "OCI = must use Oracle Linux"
Reality: Any Linux works, Ubuntu has larger community
# Cost: Oracle Linux is FREE (no license cost)
# But if team knows Ubuntu → use Ubuntu
Marketing confusion: Oracle pushes Oracle Linux, but it's not required
Generous permanent free tier (no credit card trial, no expiration):
CRITICAL limits often missed:
# Gotcha 1: 2 ADB limit is TENANCY-wide, not per region
Can have: 1 ATP in Phoenix + 1 ADW in Ashburn = 2 (limit reached)
Cannot: Add 3rd ADB in any region
# Gotcha 2: Arm instances must be VM.Standard.A1.Flex only
Cannot: Use newer A2 shapes (paid only)
# Gotcha 3: Free tier != trial credits
Free tier: Permanent, no expiration
Trial: $300 credit for 30 days (separate)
# Gotcha 4: Stopped ADB counts toward 2 ADB limit
To free slot: Must DELETE ADB, not just STOP
Migrating from AWS/Azure? Terminology traps:
| OCI Term | AWS Equivalent | Azure Equivalent | |----------|---------------|------------------| | VCN | VPC | Virtual Network | | Subnet | Subnet | Subnet | | Security List | VPC Security Group | NSG (network-level) | | NSG | Security Group | Application Security Group | | DRG | Virtual Private Gateway | VPN Gateway | | Compartment | Resource Group / OU | Resource Group | | Tenancy | Account | Subscription | | Region | Region | Region | | AD (Availability Domain) | Availability Zone | Availability Zone | | Fault Domain | (within AZ) | Availability Set | | Dynamic Group | IAM Role (for instances) | Managed Identity | | Instance Principal | EC2 Instance Profile | Managed Identity | | OCIR | ECR | Container Registry | | OKE | EKS | AKS |
Critical difference: OCI has both Security Lists (subnet) and NSGs (resource). AWS only has Security Groups (resource-level).
OCI multi-AD specifics:
OCI Regions with 3 ADs (most regions):
- US: Phoenix, Ashburn
- UK: London
- DE: Frankfurt
- AU: Sydney, Melbourne
Pattern: Distribute instances across all 3 ADs
AD-1: Web tier (2 instances) + DB primary
AD-2: Web tier (2 instances) + DB standby
AD-3: Web tier (2 instances) + DB standby
Load Balancer: Automatically spans ADs
Gotcha: Some shapes only available in specific ADs (check first)
# Check shape availability by AD
oci compute shape list \
--compartment-id <ocid> \
--availability-domain "fMgC:US-ASHBURN-AD-1"
Within each AD, OCI has Fault Domains (FD):
Best practice: Spread instances across ADs AND FDs
AD-1:
FD-1: Web instance 1
FD-2: Web instance 2
FD-3: Web instance 3
AD-2:
FD-1: Web instance 4
(repeat pattern)
Protection:
- AD failure: 2 ADs survive (66% capacity)
- FD failure: Only 1 instance affected (16% capacity)
When to use FDs: Only for extra-critical apps (adds complexity)
Compartment hierarchy (OCI-specific IAM boundary):
Root Compartment (tenancy)
│
├─ SharedServices (networking, security)
│ ├─ Network (VCNs, DRGs)
│ └─ Security (Vault, KMS, Cloud Guard)
│
├─ Production
│ ├─ App1
│ │ ├─ Compute
│ │ ├─ Database
│ │ └─ Storage
│ └─ App2
│
├─ NonProduction
│ ├─ Development
│ ├─ Testing
│ └─ Staging
│
└─ Sandbox (developers, auto-cleanup)
Key principles:
Anti-pattern: Flat structure with no hierarchy (AWS account-per-env habit)
Fixed shapes (legacy):
VM.Standard2.4: 4 OCPUs, 60 GB RAM, $218/month
Flex shapes (right-size RAM independently):
VM.Standard.E4.Flex: 4 OCPUs, 16 GB RAM, $109/month (50% savings)
Flex advantage: Pay only for RAM you need
- 1 OCPU = 1-64 GB RAM configurable
- Most apps don't need 15GB per OCPU (fixed ratio)
Migration: Replace fixed shapes with Flex for 30-50% savings
AMD instance: VM.Standard.E4.Flex (1 OCPU) = $0.03/hr
Arm instance: VM.Standard.A1.Flex (1 OCPU) = $0.01/hr (67% cheaper)
Always-Free Arm: 4 OCPUs free forever!
Use case: Web servers, CI/CD runners, dev environments
Limitation: ARM64 only (not all apps compatible)
Gotcha: Free tier is A1 shapes only, newer A2 shapes are paid
| Tier | Cost/GB/Month | Use Case | Retrieval | |------|--------------|----------|-----------| | Standard | $0.0255 | Active data, frequent access | Instant, free | | Infrequent Access | $0.0125 (51% off) | Backups, logs (accessed monthly) | Instant, $0.01/GB | | Archive | $0.0024 (91% off) | Compliance, long-term retention | 1 hour, $0.01/GB |
Lifecycle policy example:
Day 0-30: Standard ($0.0255/GB/mo)
Day 31-90: Infrequent ($0.0125/GB/mo)
Day 91+: Archive ($0.0024/GB/mo)
1 TB data for 1 year:
Without tiering: $0.0255 × 1000 × 12 = $306/year
With tiering: $0.0255 × 1000 × 1 + $0.0125 × 1000 × 2 + $0.0024 × 1000 × 9 = $72/year
Savings: $234/year (76%)
OCI Security Zones = Infrastructure-level policy enforcement:
Security Zone enforces:
✓ All storage encrypted
✓ No public buckets
✓ No internet gateways in VCN
✓ All databases private endpoint only
✓ Cloud Guard enabled
Enforcement: API rejects violating requests (preventive, not detective)
Example:
oci os bucket create --public-access-type ObjectRead
→ FAILS if compartment is in Security Zone
Use case: Production, PCI-DSS, healthcare (mandatory controls)
Gotcha: Security Zones can break existing automation (test in dev first)
WHEN TO LOAD oci-well-architected-checklist.md:
Do NOT load for:
Primary References (50+ official sources scraped):
Note: All anti-patterns, terminology mappings, and Always-Free limits in this skill are derived from official Oracle documentation and A-Team Oracle blog
development
Use when storing credentials in OCI Vault, troubleshooting secret retrieval failures, implementing secret rotation, or setting up application authentication to Vault. Covers vault hierarchy confusion, IAM permission gotchas, cost optimization, temp file security, and audit logging.
development
Use when managing Oracle Autonomous Database on OCI, troubleshooting performance issues, optimizing costs, or implementing HA/DR. Covers ADB-specific gotchas, cost traps, SQL_ID debugging workflows, auto-scaling behavior, and version differences (19c/21c/23ai/26ai).
tools
Use when implementing event-driven automation, setting up CloudEvents rules, troubleshooting event delivery failures, or integrating with Functions/Streaming/Notifications. Covers event rule patterns, filter syntax, action types, dead letter queue configuration, and event-driven architecture anti-patterns.
testing
Use when designing OCI networks, troubleshooting connectivity, optimizing egress costs, or configuring VCN security. Covers Service Gateway cost savings, VCN CIDR immutability, Security List vs NSG tradeoffs, VCN peering limitations, and Load Balancer subnet requirements.