VP Engineering Perspective

Engineering Org Design (Team Topologies)

Team Types

| Type | Purpose | Characteristics | Size | |------|---------|----------------|------| | Stream-aligned | Deliver user/business value | Full-stack, autonomous, owns entire feature slice | 5-8 | | Platform | Reduce cognitive load for stream teams | Internal products, self-service APIs/tools | 3-6 | | Enabling | Help teams adopt new capabilities | Coaching, not doing; temporary engagement | 2-3 | | Complicated subsystem | Deep specialist expertise | ML, payments, security, real-time systems | 2-4 |

Interaction Modes

| Mode | Description | When to Use | |------|-------------|-------------| | Collaboration | Teams work together closely | New capability discovery, high uncertainty | | X-as-a-Service | One team provides, other consumes | Well-defined API/platform capability | | Facilitating | One team coaches another | Skill transfer, technology adoption |

Org Design Template

## Engineering Organization: [Company Name]

### Team Map

Stream-aligned Teams:
├── Team Alpha: [product area] (5 people)
│   Owns: [service/feature list]
│   Stack: [tech stack]
├── Team Beta: [product area] (6 people)
│   Owns: [service/feature list]
│   Stack: [tech stack]
└── Team Gamma: [product area] (5 people)
    Owns: [service/feature list]
    Stack: [tech stack]

Platform Team:
└── Team Platform (4 people)
    Provides: CI/CD, observability, developer portal
    Interaction: X-as-a-Service

Enabling Team:
└── Team Enable (2 people)
    Focus: [current initiative - e.g., Kubernetes migration]
    Interaction: Facilitating (rotates every quarter)

Complicated Subsystem:
└── Team ML (3 people)
    Owns: ML pipeline, model serving, feature store
    Interaction: Collaboration with stream teams

### Cognitive Load Assessment
| Team | Intrinsic (domain) | Extraneous (tools) | Total | Status |
|------|-------------------|--------------------|----|--------|
| Alpha | 6/10 | 3/10 | 9/10 | At capacity |
| Beta | 5/10 | 4/10 | 9/10 | At capacity |
| Platform | 7/10 | 2/10 | 9/10 | At capacity |

Org Design Anti-Patterns

| Anti-Pattern | Symptom | Fix | |-------------|---------|-----| | Conway's Law violation | Architecture doesn't match team structure | Align teams to desired architecture | | Shared services bottleneck | Every team waits for "core team" | Split into platform + self-service | | Matrix management | Unclear ownership, split loyalty | Single reporting line per IC | | Too many meetings | "Alignment" overhead > execution | Reduce interaction surface, use async | | Hero culture | One person knows everything | Document, pair, rotate on-call |

Process Improvement (Agile Maturity)

Agile Maturity Model

| Level | Name | Characteristics | |-------|------|----------------| | 1 | Initial | Ad-hoc, no process, firefighting | | 2 | Managed | Basic scrum/kanban, inconsistent | | 3 | Defined | Consistent process, metrics tracked | | 4 | Measured | Data-driven decisions, predictable delivery | | 5 | Optimizing | Continuous improvement, experiments |

Process Improvement Framework

## Process Improvement Cycle

### 1. Observe (1 sprint)
- Shadow team ceremonies
- Measure cycle time, WIP, defects
- Interview team members (1-on-1)

### 2. Diagnose
| Problem | Root Cause | Impact |
|---------|-----------|--------|
| [symptom] | [why] | [what it costs] |

### 3. Hypothesize
"If we [change], then [expected outcome], measured by [metric]"

### 4. Experiment (2-3 sprints)
- Implement ONE change at a time
- Measure baseline vs. new
- Collect team feedback

### 5. Evaluate
- Did the metric improve?
- Did the team feel the improvement?
- Any unintended side effects?

### 6. Adopt or Revert
- Improvement verified: document and standardize
- No improvement: revert and try next hypothesis

Common Process Fixes

| Problem | Fix | Metric | |---------|-----|--------| | Missed deadlines | Smaller stories, better estimation | Story completion rate | | Too much WIP | WIP limits (Kanban) | Cycle time | | Unclear requirements | Refinement meetings, acceptance criteria | Defect rate | | Deployment fear | Feature flags, canary deploys | Deploy frequency | | Slow code reviews | SLA (24h max), small PRs | Review turnaround | | Meeting overload | No-meeting days, async updates | Focus time % |

Cross-Team Dependency Management

Dependency Mapping

## Cross-Team Dependencies: [Quarter]

### Dependency Matrix
| Providing Team | Consuming Team | Dependency | Type | Status | Risk |
|---------------|---------------|-----------|------|--------|------|
| Platform | Alpha | Auth service v2 | Blocking | In progress | Medium |
| Alpha | Beta | User API | Non-blocking | Available | Low |
| ML | Gamma | Rec engine | Blocking | Not started | High |

### Dependency Types
- **Blocking:** Must be completed before consumer can start
- **Non-blocking:** Can work in parallel with mocked interface
- **Soft:** Nice to have, workaround exists

### Visualization
Team Alpha ──blocks──→ Team Beta (User API)
Team Platform ──blocks──→ Team Alpha (Auth v2)
Team ML ──blocks──→ Team Gamma (Rec engine) ← HIGH RISK

Dependency Resolution Strategies

| Strategy | When to Use | |----------|-------------| | Contract-first | Define API contract, both teams implement independently | | Embedded engineer | Loan an engineer from providing team | | Shared interface | Agree on interface, mock until ready | | Prioritize differently | Move blocking work to top of providing team's backlog | | Decouple | Feature flags, adapter pattern, event-driven | | Eliminate | Redesign to remove dependency entirely |

Dependency Anti-Patterns

| Anti-Pattern | Neden Yanlis | Dogru Yol | |-------------|-------------|-----------| | Hidden dependencies | Discovered too late | Map dependencies in planning | | Dependency as excuse | "Blocked by Team X" for weeks | Escalate immediately, find alternatives | | Hub team (everything flows through one) | Bottleneck | Distribute ownership, self-service | | Cross-team code ownership | Slow PRs, merge conflicts | Clear ownership boundaries |

Engineering Culture Building

Culture Pillars

## Engineering Culture: [Company Name]

### Our Values (with behaviors)

1. **Ownership**
   - Do: Take responsibility end-to-end (build, deploy, monitor)
   - Don't: "Not my code" / "That's ops problem"
   - Measure: On-call engagement, post-incident participation

2. **Craft**
   - Do: Write tests, review thoughtfully, refactor proactively
   - Don't: "Ship now, fix later" (unless P0)
   - Measure: Code review quality, tech debt ratio

3. **Transparency**
   - Do: Share context, document decisions, default to public channels
   - Don't: Hoarding information, private DMs for team decisions
   - Measure: Documentation coverage, team survey

4. **Learning**
   - Do: Blameless retros, share mistakes, invest in growth
   - Don't: Blame individuals, hide failures
   - Measure: Retro action items completed, conference talks

5. **Speed**
   - Do: Small PRs, feature flags, iterate quickly
   - Don't: Big bang releases, analysis paralysis
   - Measure: Lead time, deploy frequency

Culture Building Practices

| Practice | Frequency | Owner | Goal | |----------|-----------|-------|------| | Blameless post-mortems | Per incident | Engineering managers | Learn from failures | | Engineering all-hands | Monthly | VP Engineering | Alignment, wins, direction | | Tech talks / brown bags | Biweekly | Rotating engineers | Knowledge sharing | | Hack days / hackathon | Quarterly | Engineering leads | Innovation, morale | | Architecture review | Biweekly | Architects | Consistency, quality | | 1-on-1s | Weekly | Managers | Growth, retention | | Skip-level 1-on-1s | Monthly | VP/Director | Pulse check, escalation | | Engineering blog | Monthly+ | Rotating authors | Employer branding | | Open source contributions | Continuous | Anyone | Community, recruitment |

OKR Setting for Engineering

OKR Template

## Engineering OKRs: Q[X] [Year]

### Objective 1: Accelerate delivery velocity
| KR | Target | Current | Status |
|----|--------|---------|--------|
| KR1.1: Reduce lead time from code to production | < 4 hours | 2 days | [on/off track] |
| KR1.2: Increase deploy frequency | 5x/day | 2x/week | [on/off track] |
| KR1.3: Reduce change failure rate | < 5% | 12% | [on/off track] |

### Objective 2: Improve developer experience
| KR | Target | Current | Status |
|----|--------|---------|--------|
| KR2.1: Developer satisfaction score | > 4.2/5 | 3.6/5 | [on/off track] |
| KR2.2: Reduce CI build time | < 5 min | 12 min | [on/off track] |
| KR2.3: New hire productive in < 2 weeks | 90% | 60% | [on/off track] |

### Objective 3: Strengthen reliability
| KR | Target | Current | Status |
|----|--------|---------|--------|
| KR3.1: Achieve 99.95% uptime | 99.95% | 99.8% | [on/off track] |
| KR3.2: Reduce MTTR to < 30 min | 30 min | 2 hours | [on/off track] |
| KR3.3: Zero P0 incidents from known issues | 0 | 3/quarter | [on/off track] |

OKR Anti-Patterns

| Anti-Pattern | Neden Yanlis | Dogru Yol | |-------------|-------------|-----------| | Feature-based OKRs | "Ship feature X" is a task, not an outcome | Focus on outcomes ("Reduce churn by 10%") | | Too many OKRs | Diluted focus | 3 objectives, 3-4 KRs each max | | Binary KRs | No progress signal | Quantitative, measurable, with baseline | | No alignment | Disconnected from company OKRs | Cascade from company → engineering → team | | Set and forget | No mid-quarter check | Weekly tracking, monthly review |

Incident Management Maturity

Maturity Levels

| Level | Characteristics | Actions | |-------|----------------|---------| | 1: Reactive | No process, ad-hoc response, hero-driven | Document basic runbooks, assign on-call | | 2: Organized | On-call rotation, basic alerting, Slack channel | Add severity classification, escalation paths | | 3: Systematic | Incident commander role, structured comms, SLOs | Add blameless post-mortems, action item tracking | | 4: Proactive | Error budgets, chaos engineering, SLO dashboards | Game days, automated remediation | | 5: Predictive | ML-based anomaly detection, self-healing | Continuous improvement, near-zero MTTR |

Incident Management Framework

## Incident Response Structure

### Roles
| Role | Responsibility |
|------|---------------|
| Incident Commander (IC) | Coordinates response, makes decisions |
| Technical Lead | Diagnoses and fixes the issue |
| Communications Lead | Stakeholder updates, status page |
| Scribe | Documents timeline and actions |

### Severity Levels
| Level | Definition | Response Time | IC Required | Status Page | Exec Notify |
|-------|-----------|--------------|-------------|-------------|-------------|
| SEV-1 | Full outage | 5 min | Yes | Yes | Immediately |
| SEV-2 | Major degradation | 15 min | Yes | Yes | Within 1h |
| SEV-3 | Minor impact | 1 hour | No | Optional | No |
| SEV-4 | No user impact | Next business day | No | No | No |

### Communication Cadence
| SEV | Internal Update | External Update | Exec Update |
|-----|----------------|----------------|-------------|
| SEV-1 | Every 15 min | Every 30 min | Every 30 min |
| SEV-2 | Every 30 min | Every 1h | Every 2h |
| SEV-3 | Every 2h | If customer-facing | None |

Post-Incident Review Quality Checklist

[ ] Timeline is complete and accurate
[ ] Root cause (not symptoms) identified
[ ] Contributing factors documented
[ ] Action items are specific, assigned, and deadlined
[ ] "5 whys" or similar root cause analysis used
[ ] Systemic fixes preferred over individual fixes
[ ] No blame assigned to individuals
[ ] Detection improvement identified
[ ] Recovery improvement identified
[ ] Shared with broader engineering team

Platform Team Strategy

Platform Team Charter

## Platform Team Charter

### Mission
Reduce cognitive load on stream-aligned teams by providing self-service
infrastructure, tooling, and abstractions.

### Principles
1. Treat internal teams as customers
2. Self-service > ticket-based requests
3. Paved roads, not mandates
4. Measure developer experience, not just uptime

### Product Areas
| Area | What We Provide | Maturity |
|------|----------------|----------|
| CI/CD | Build pipelines, deploy automation | Mature |
| Observability | Logging, metrics, tracing, dashboards | Growing |
| Developer portal | Service catalog, docs, templates | Early |
| Infrastructure | K8s, databases, caching, queues | Mature |
| Security | Secret management, vulnerability scanning | Growing |

### Success Metrics
| Metric | Target | Current |
|--------|--------|---------|
| Time to onboard new service | < 1 day | 1 week |
| Developer satisfaction (platform) | > 4.0/5 | 3.5/5 |
| Self-service adoption rate | > 80% | 50% |
| Support tickets per team per month | < 5 | 12 |

### Roadmap (Next 2 Quarters)
| Quarter | Initiative | Impact |
|---------|-----------|--------|
| Q1 | Internal developer portal | Reduce onboarding time 50% |
| Q1 | Standardized service template | Consistent microservices |
| Q2 | Golden path for new services | < 1 hour to first deploy |
| Q2 | Self-service database provisioning | Remove DBA bottleneck |

Platform Anti-Patterns

| Anti-Pattern | Symptom | Fix | |-------------|---------|-----| | Building for no one | Platform features nobody asked for | Customer interviews, usage metrics | | Mandatory adoption | Teams forced to use half-baked tools | Make it so good they want to use it | | Ticket-based everything | Slow provisioning, frustrated teams | Self-service APIs and UIs | | No documentation | Teams can't use platform without help | Treat docs as product | | Ivory tower | Platform team disconnected from users | Embed with stream teams periodically |

Developer Experience (DX) Optimization

DX Metrics

| Metric | How to Measure | Target | |--------|---------------|--------| | Dev environment setup | Time from clone to running | < 15 min | | CI build time | Pipeline duration (p50/p95) | < 5 min (p50) | | Code review turnaround | PR open to first review | < 4 hours | | Deploy to production | Merge to live | < 1 hour | | Incident notification | Alert to human eyes | < 5 min | | Documentation freshness | % docs updated in last 90 days | > 80% | | On-call burden | Pages per week per person | < 2 | | Context switching | Interruptions per focus block | < 1 |

DX Improvement Roadmap

## DX Improvement Plan

### Quick Wins (< 1 week each)
- [ ] Pre-configured dev containers / devbox
- [ ] One-command project setup script
- [ ] PR template with checklist
- [ ] Slack bot for deploy status
- [ ] Auto-assign code reviewers

### Medium Term (1-4 weeks)
- [ ] Reduce CI build time by 50%
- [ ] Local development matches production (docker-compose)
- [ ] API documentation auto-generated from code
- [ ] Error messages link to runbooks
- [ ] Feature flag self-service UI

### Long Term (1-3 months)
- [ ] Internal developer portal (Backstage/custom)
- [ ] Self-service infrastructure provisioning
- [ ] Automated dependency updates (Renovate)
- [ ] Golden path templates for new services
- [ ] DX survey and tracking dashboard

DX Survey Template

## Developer Experience Survey (Quarterly)

Rate 1-5 (1 = terrible, 5 = excellent):

### Development
1. How easy is it to set up your local dev environment?
2. How reliable is your local dev environment?
3. How fast is your CI/CD pipeline?
4. How easy is it to find and understand documentation?

### Collaboration
5. How efficient is your code review process?
6. How well does cross-team collaboration work?
7. How effective are your team's meetings?

### Operations
8. How manageable is on-call?
9. How good are your monitoring and alerting tools?
10. How confident are you in deploying to production?

### Growth
11. How supported do you feel in your career growth?
12. How much time do you spend on meaningful work vs. toil?

### Open Ended
13. What is the biggest time-waster in your day?
14. If you could change one thing about engineering, what would it be?

Release Management at Scale

Release Strategy Options

| Strategy | When to Use | Complexity | |----------|-------------|------------| | Continuous deployment | Mature CI/CD, high test confidence | Low (automated) | | Release train | Multi-team, coordinated releases | Medium | | Feature flags | Decouple deploy from release | Medium | | Blue-green deploy | Zero-downtime requirement | Medium | | Canary release | Gradual rollout, risk mitigation | High | | Ring deployment | Internal -> beta -> GA | High |

Release Process (Multi-Team)

## Release Checklist: v[X.Y.Z]

### Pre-Release (T-2 days)
- [ ] All feature branches merged to release branch
- [ ] Release branch passes all tests
- [ ] Cross-team integration tests passing
- [ ] Dependent services compatible (API contracts)
- [ ] Database migrations tested
- [ ] Feature flags configured for new features
- [ ] Rollback plan documented

### Release Day (T-0)
- [ ] Release branch deployed to staging
- [ ] QA sign-off on staging
- [ ] Monitoring dashboards reviewed (baseline)
- [ ] On-call team briefed
- [ ] Canary deployment initiated
- [ ] Canary metrics monitored (error rate, latency, business KPIs)
- [ ] Full rollout completed
- [ ] Post-deploy verification

### Post-Release (T+1)
- [ ] Metrics compared to baseline
- [ ] No regression in error rates or latency
- [ ] Customer support briefed on changes
- [ ] Release notes published
- [ ] Feature flags cleaned up (remove old)
- [ ] Retrospective scheduled (if issues occurred)

Release Metrics

| Metric | What It Measures | Target | |--------|-----------------|--------| | Release frequency | How often we ship | Weekly or more | | Release lead time | Code complete to production | < 1 day | | Release success rate | Releases without rollback | > 95% | | Rollback rate | How often we revert | < 5% | | Hotfix frequency | Emergency fixes needed | < 1/month | | Feature flag cleanup | Stale flags removed | Within 30 days |

Release Anti-Patterns

| Anti-Pattern | Neden Yanlis | Dogru Yol | |-------------|-------------|-----------| | "Big bang" releases | High risk, hard to debug | Small, frequent releases | | Release branch lives too long | Merge conflicts, integration hell | Short-lived, merge daily | | Manual release process | Error-prone, slow | Fully automated pipeline | | No rollback plan | Stuck with broken release | Always have rollback procedure | | Feature flags never cleaned | Combinatorial explosion | Clean up within 30 days | | Friday deployments | Nobody around for issues | Deploy Mon-Thu, observe Fri | | No release notes | Users/support confused | Automated changelog generation |

VP Engineering Perspective

Engineering Org Design (Team Topologies)

Team Types

Interaction Modes

Org Design Template

## Engineering Organization: [Company Name]

### Team Map

Stream-aligned Teams:
├── Team Alpha: [product area] (5 people)
│   Owns: [service/feature list]
│   Stack: [tech stack]
├── Team Beta: [product area] (6 people)
│   Owns: [service/feature list]
│   Stack: [tech stack]
└── Team Gamma: [product area] (5 people)
    Owns: [service/feature list]
    Stack: [tech stack]

Platform Team:
└── Team Platform (4 people)
    Provides: CI/CD, observability, developer portal
    Interaction: X-as-a-Service

Enabling Team:
└── Team Enable (2 people)
    Focus: [current initiative - e.g., Kubernetes migration]
    Interaction: Facilitating (rotates every quarter)

Complicated Subsystem:
└── Team ML (3 people)
    Owns: ML pipeline, model serving, feature store
    Interaction: Collaboration with stream teams

### Cognitive Load Assessment
| Team | Intrinsic (domain) | Extraneous (tools) | Total | Status |
|------|-------------------|--------------------|----|--------|
| Alpha | 6/10 | 3/10 | 9/10 | At capacity |
| Beta | 5/10 | 4/10 | 9/10 | At capacity |
| Platform | 7/10 | 2/10 | 9/10 | At capacity |

Org Design Anti-Patterns

Process Improvement (Agile Maturity)

Agile Maturity Model

Process Improvement Framework

## Process Improvement Cycle

### 1. Observe (1 sprint)
- Shadow team ceremonies
- Measure cycle time, WIP, defects
- Interview team members (1-on-1)

### 2. Diagnose
| Problem | Root Cause | Impact |
|---------|-----------|--------|
| [symptom] | [why] | [what it costs] |

### 3. Hypothesize
"If we [change], then [expected outcome], measured by [metric]"

### 4. Experiment (2-3 sprints)
- Implement ONE change at a time
- Measure baseline vs. new
- Collect team feedback

### 5. Evaluate
- Did the metric improve?
- Did the team feel the improvement?
- Any unintended side effects?

### 6. Adopt or Revert
- Improvement verified: document and standardize
- No improvement: revert and try next hypothesis

Common Process Fixes

Cross-Team Dependency Management

Dependency Mapping

## Cross-Team Dependencies: [Quarter]

### Dependency Matrix
| Providing Team | Consuming Team | Dependency | Type | Status | Risk |
|---------------|---------------|-----------|------|--------|------|
| Platform | Alpha | Auth service v2 | Blocking | In progress | Medium |
| Alpha | Beta | User API | Non-blocking | Available | Low |
| ML | Gamma | Rec engine | Blocking | Not started | High |

### Dependency Types
- **Blocking:** Must be completed before consumer can start
- **Non-blocking:** Can work in parallel with mocked interface
- **Soft:** Nice to have, workaround exists

### Visualization
Team Alpha ──blocks──→ Team Beta (User API)
Team Platform ──blocks──→ Team Alpha (Auth v2)
Team ML ──blocks──→ Team Gamma (Rec engine) ← HIGH RISK

Dependency Resolution Strategies

Dependency Anti-Patterns

Engineering Culture Building

Culture Pillars

## Engineering Culture: [Company Name]

### Our Values (with behaviors)

1. **Ownership**
   - Do: Take responsibility end-to-end (build, deploy, monitor)
   - Don't: "Not my code" / "That's ops problem"
   - Measure: On-call engagement, post-incident participation

2. **Craft**
   - Do: Write tests, review thoughtfully, refactor proactively
   - Don't: "Ship now, fix later" (unless P0)
   - Measure: Code review quality, tech debt ratio

3. **Transparency**
   - Do: Share context, document decisions, default to public channels
   - Don't: Hoarding information, private DMs for team decisions
   - Measure: Documentation coverage, team survey

4. **Learning**
   - Do: Blameless retros, share mistakes, invest in growth
   - Don't: Blame individuals, hide failures
   - Measure: Retro action items completed, conference talks

5. **Speed**
   - Do: Small PRs, feature flags, iterate quickly
   - Don't: Big bang releases, analysis paralysis
   - Measure: Lead time, deploy frequency

Culture Building Practices

OKR Setting for Engineering

OKR Template

## Engineering OKRs: Q[X] [Year]

### Objective 1: Accelerate delivery velocity
| KR | Target | Current | Status |
|----|--------|---------|--------|
| KR1.1: Reduce lead time from code to production | < 4 hours | 2 days | [on/off track] |
| KR1.2: Increase deploy frequency | 5x/day | 2x/week | [on/off track] |
| KR1.3: Reduce change failure rate | < 5% | 12% | [on/off track] |

### Objective 2: Improve developer experience
| KR | Target | Current | Status |
|----|--------|---------|--------|
| KR2.1: Developer satisfaction score | > 4.2/5 | 3.6/5 | [on/off track] |
| KR2.2: Reduce CI build time | < 5 min | 12 min | [on/off track] |
| KR2.3: New hire productive in < 2 weeks | 90% | 60% | [on/off track] |

### Objective 3: Strengthen reliability
| KR | Target | Current | Status |
|----|--------|---------|--------|
| KR3.1: Achieve 99.95% uptime | 99.95% | 99.8% | [on/off track] |
| KR3.2: Reduce MTTR to < 30 min | 30 min | 2 hours | [on/off track] |
| KR3.3: Zero P0 incidents from known issues | 0 | 3/quarter | [on/off track] |

OKR Anti-Patterns

Incident Management Maturity

Maturity Levels

Incident Management Framework

## Incident Response Structure

### Roles
| Role | Responsibility |
|------|---------------|
| Incident Commander (IC) | Coordinates response, makes decisions |
| Technical Lead | Diagnoses and fixes the issue |
| Communications Lead | Stakeholder updates, status page |
| Scribe | Documents timeline and actions |

### Severity Levels
| Level | Definition | Response Time | IC Required | Status Page | Exec Notify |
|-------|-----------|--------------|-------------|-------------|-------------|
| SEV-1 | Full outage | 5 min | Yes | Yes | Immediately |
| SEV-2 | Major degradation | 15 min | Yes | Yes | Within 1h |
| SEV-3 | Minor impact | 1 hour | No | Optional | No |
| SEV-4 | No user impact | Next business day | No | No | No |

### Communication Cadence
| SEV | Internal Update | External Update | Exec Update |
|-----|----------------|----------------|-------------|
| SEV-1 | Every 15 min | Every 30 min | Every 30 min |
| SEV-2 | Every 30 min | Every 1h | Every 2h |
| SEV-3 | Every 2h | If customer-facing | None |

Post-Incident Review Quality Checklist

[ ] Timeline is complete and accurate
[ ] Root cause (not symptoms) identified
[ ] Contributing factors documented
[ ] Action items are specific, assigned, and deadlined
[ ] "5 whys" or similar root cause analysis used
[ ] Systemic fixes preferred over individual fixes
[ ] No blame assigned to individuals
[ ] Detection improvement identified
[ ] Recovery improvement identified
[ ] Shared with broader engineering team

Platform Team Strategy

Platform Team Charter

## Platform Team Charter

### Mission
Reduce cognitive load on stream-aligned teams by providing self-service
infrastructure, tooling, and abstractions.

### Principles
1. Treat internal teams as customers
2. Self-service > ticket-based requests
3. Paved roads, not mandates
4. Measure developer experience, not just uptime

### Product Areas
| Area | What We Provide | Maturity |
|------|----------------|----------|
| CI/CD | Build pipelines, deploy automation | Mature |
| Observability | Logging, metrics, tracing, dashboards | Growing |
| Developer portal | Service catalog, docs, templates | Early |
| Infrastructure | K8s, databases, caching, queues | Mature |
| Security | Secret management, vulnerability scanning | Growing |

### Success Metrics
| Metric | Target | Current |
|--------|--------|---------|
| Time to onboard new service | < 1 day | 1 week |
| Developer satisfaction (platform) | > 4.0/5 | 3.5/5 |
| Self-service adoption rate | > 80% | 50% |
| Support tickets per team per month | < 5 | 12 |

### Roadmap (Next 2 Quarters)
| Quarter | Initiative | Impact |
|---------|-----------|--------|
| Q1 | Internal developer portal | Reduce onboarding time 50% |
| Q1 | Standardized service template | Consistent microservices |
| Q2 | Golden path for new services | < 1 hour to first deploy |
| Q2 | Self-service database provisioning | Remove DBA bottleneck |

Platform Anti-Patterns

Developer Experience (DX) Optimization

DX Metrics

DX Improvement Roadmap

## DX Improvement Plan

### Quick Wins (< 1 week each)
- [ ] Pre-configured dev containers / devbox
- [ ] One-command project setup script
- [ ] PR template with checklist
- [ ] Slack bot for deploy status
- [ ] Auto-assign code reviewers

### Medium Term (1-4 weeks)
- [ ] Reduce CI build time by 50%
- [ ] Local development matches production (docker-compose)
- [ ] API documentation auto-generated from code
- [ ] Error messages link to runbooks
- [ ] Feature flag self-service UI

### Long Term (1-3 months)
- [ ] Internal developer portal (Backstage/custom)
- [ ] Self-service infrastructure provisioning
- [ ] Automated dependency updates (Renovate)
- [ ] Golden path templates for new services
- [ ] DX survey and tracking dashboard

DX Survey Template

## Developer Experience Survey (Quarterly)

Rate 1-5 (1 = terrible, 5 = excellent):

### Development
1. How easy is it to set up your local dev environment?
2. How reliable is your local dev environment?
3. How fast is your CI/CD pipeline?
4. How easy is it to find and understand documentation?

### Collaboration
5. How efficient is your code review process?
6. How well does cross-team collaboration work?
7. How effective are your team's meetings?

### Operations
8. How manageable is on-call?
9. How good are your monitoring and alerting tools?
10. How confident are you in deploying to production?

### Growth
11. How supported do you feel in your career growth?
12. How much time do you spend on meaningful work vs. toil?

### Open Ended
13. What is the biggest time-waster in your day?
14. If you could change one thing about engineering, what would it be?

Release Management at Scale

Release Strategy Options

Release Process (Multi-Team)

## Release Checklist: v[X.Y.Z]

### Pre-Release (T-2 days)
- [ ] All feature branches merged to release branch
- [ ] Release branch passes all tests
- [ ] Cross-team integration tests passing
- [ ] Dependent services compatible (API contracts)
- [ ] Database migrations tested
- [ ] Feature flags configured for new features
- [ ] Rollback plan documented

### Release Day (T-0)
- [ ] Release branch deployed to staging
- [ ] QA sign-off on staging
- [ ] Monitoring dashboards reviewed (baseline)
- [ ] On-call team briefed
- [ ] Canary deployment initiated
- [ ] Canary metrics monitored (error rate, latency, business KPIs)
- [ ] Full rollout completed
- [ ] Post-deploy verification

### Post-Release (T+1)
- [ ] Metrics compared to baseline
- [ ] No regression in error rates or latency
- [ ] Customer support briefed on changes
- [ ] Release notes published
- [ ] Feature flags cleaned up (remove old)
- [ ] Retrospective scheduled (if issues occurred)

Adoption

vibeeval/vp-engineering

$ install --global

Security Scan Results

SKILL.md

VP Engineering Perspective

Engineering Org Design (Team Topologies)

Team Types

Interaction Modes

Org Design Template

Org Design Anti-Patterns

Process Improvement (Agile Maturity)

Agile Maturity Model

Process Improvement Framework

Common Process Fixes

Cross-Team Dependency Management

Dependency Mapping

Dependency Resolution Strategies

Dependency Anti-Patterns

Engineering Culture Building

Culture Pillars

Culture Building Practices

OKR Setting for Engineering

OKR Template

OKR Anti-Patterns

Incident Management Maturity

Maturity Levels

Incident Management Framework

Post-Incident Review Quality Checklist

Platform Team Strategy

Platform Team Charter

Platform Anti-Patterns

Developer Experience (DX) Optimization

DX Metrics

DX Improvement Roadmap

DX Survey Template

Release Management at Scale

Release Strategy Options

Release Process (Multi-Team)

Release Metrics

Release Anti-Patterns

Related Skills

vibeeval/workflow-router

vibeeval/wiring

vibeeval/websocket-patterns

vibeeval/visual-verdict

vibeeval/vp-engineering

$ install --global

Security Scan Results

SKILL.md

VP Engineering Perspective

Engineering Org Design (Team Topologies)

Team Types

Interaction Modes

Org Design Template

Org Design Anti-Patterns

Process Improvement (Agile Maturity)

Agile Maturity Model

Process Improvement Framework

Common Process Fixes

Cross-Team Dependency Management

Dependency Mapping

Dependency Resolution Strategies

Dependency Anti-Patterns

Engineering Culture Building

Culture Pillars

Culture Building Practices

OKR Setting for Engineering

OKR Template

OKR Anti-Patterns

Incident Management Maturity

Maturity Levels

Incident Management Framework

Post-Incident Review Quality Checklist

Platform Team Strategy

Platform Team Charter

Platform Anti-Patterns

Developer Experience (DX) Optimization

DX Metrics

DX Improvement Roadmap