microservices-ai-integration/SKILL.md
Integrating AI into a microservices architecture — AI model server as a microservice, AI gateway pattern, async AI job pipeline, AI-enhanced orchestration (Kubeflow, Seldon Core), and wiring the AI metering/billing layer into a distributed...
npx skillsauth add peterbamuhigire/skills-web-dev microservices-ai-integrationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
microservices-ai-integration or would be better handled by a more specific companion skill.SKILL.md first, then load only the referenced deep-dive files that are necessary for the task.Treat AI as a dedicated, independently deployable microservice — the AI Service. All AI API calls from across the system route through it. This service enforces the AI Module Gate, Token Budget Guard, and Token Ledger (from ai-architecture-patterns) at a single, auditable point.
[Any Service] → [AI Gateway] → [AI Service] → [External AI Provider API]
↓ ↓
Gate Check Token Ledger
Budget Guard ai_usage_log
The AI Service is the only service that talks to external AI provider APIs (Anthropic, OpenAI, DeepSeek, Gemini). All other services call the AI Service — never the external API directly.
Why:
POST /ai/complete
{
"tenant_id": 42,
"user_id": 101,
"feature_slug": "sales-summary",
"model": "claude-haiku-4-5",
"system_prompt": "You are a sales analyst...",
"user_message": "Summarise today's sales: ...",
"max_tokens": 400
}
→ 200 OK
{
"content": "Today's total sales were UGX 2,450,000...",
"input_tokens": 312,
"output_tokens": 87,
"cost_usd": 0.000598,
"request_id": "req_abc123"
}
→ 402 Payment Required (budget exhausted)
{ "error": "ai_budget_exceeded", "message": "Monthly AI budget exhausted for tenant 42" }
→ 403 Forbidden (module not active)
{ "error": "ai_module_inactive", "message": "AI module not activated for tenant 42" }
// app/Http/Controllers/AICompletionController.php
class AICompletionController extends Controller
{
public function complete(AICompletionRequest $request, AIMeteredClient $client): JsonResponse
{
try {
$response = $client->call(
tenantId: $request->tenant_id,
userId: $request->user_id,
featureSlug: $request->feature_slug,
request: new AIRequest(
model: $request->model,
systemPrompt: $request->system_prompt,
userMessage: AIInputSanitiser::sanitise($request->user_message),
maxTokens: $request->max_tokens ?? 1024,
)
);
return response()->json([
'content' => AIOutputValidator::sanitiseText($response->content),
'input_tokens' => $response->inputTokens,
'output_tokens'=> $response->outputTokens,
'cost_usd' => $response->costUsd,
'request_id' => $response->requestId,
]);
} catch (AIModuleNotActiveException $e) {
return response()->json(['error' => 'ai_module_inactive', 'message' => $e->getMessage()], 403);
} catch (AIBudgetExceededException $e) {
return response()->json(['error' => 'ai_budget_exceeded', 'message' => $e->getMessage()], 402);
}
}
}
For AI features where response time exceeds 3s (report generation, batch analysis, document extraction), use an async queue pattern.
User Request → POST /reports/generate
→ 202 Accepted { "job_id": "job_xyz" }
→ Job dispatched to ai-reports queue
Worker Service → dequeues job
→ calls AI Service (POST /ai/complete)
→ stores result in reports table
→ publishes ReportCompleted event
User polls → GET /reports/job_xyz/status
→ { "status": "complete", "download_url": "/reports/job_xyz/download" }
Or push-based with WebSocket/SSE:
Worker publishes ReportCompleted event
→ notification-service listens
→ pushes in-app notification to user ("Your report is ready")
// Dispatching the AI job
class GenerateReportController extends Controller
{
public function generate(Request $request): JsonResponse
{
$job = AIReportJob::create([
'tenant_id' => $request->tenant_id,
'user_id' => $request->user_id,
'params' => $request->params,
'status' => 'queued',
]);
dispatch(new ProcessAIReportJob($job->id))->onQueue('ai-reports');
return response()->json(['job_id' => $job->id, 'status' => 'queued'], 202);
}
}
// The queued job
class ProcessAIReportJob implements ShouldQueue
{
use InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 3;
public int $timeout = 60; // 60s max per attempt
public function handle(): void
{
$job = AIReportJob::findOrFail($this->jobId);
$job->update(['status' => 'processing']);
$response = Http::timeout(30)->post('http://ai-service/ai/complete', [
'tenant_id' => $job->tenant_id,
'user_id' => $job->user_id,
'feature_slug'=> 'report-generation',
'model' => 'claude-haiku-4-5',
'system_prompt' => '...',
'user_message' => $this->buildPrompt($job->params),
'max_tokens' => 2000,
]);
$job->update([
'status' => 'complete',
'result' => $response->json('content'),
'cost_usd'=> $response->json('cost_usd'),
]);
event(new AIReportCompleted($job->tenant_id, $job->user_id, $job->id));
}
public function failed(\Throwable $e): void
{
AIReportJob::find($this->jobId)?->update(['status' => 'failed', 'error' => $e->getMessage()]);
}
}
Layer the AI Service behind the API gateway to enforce:
# NGINX — AI Service upstream with circuit breaker
upstream ai_service {
least_time last_byte;
server ai-service-1.internal:8080;
server ai-service-2.internal:8080;
}
location /ai/ {
# Rate limit: 100 AI requests per minute per tenant
limit_req zone=ai_per_tenant burst=20 nodelay;
proxy_pass http://ai_service;
proxy_read_timeout 60s; # AI calls can be slow
# Circuit breaker via health check
health_check uri=/health interval=5s fails=1;
}
Source: Pandiya & Charankar Ch. 3
AI can enhance the orchestration layer of a microservices system:
AI analyses historical traffic patterns to pre-scale services before load spikes.
AI models predict which service instances are likely to fail before they do, triggering preemptive migration.
Seldon Core extends Kubernetes to serve ML models as REST/gRPC services with the same lifecycle as any other microservice (canary deployments, A/B testing, traffic splitting).
# Seldon Deployment — serve a scikit-learn model
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: risk-predictor
spec:
predictors:
- name: default
graph:
name: risk-model
implementation: SKLEARN_SERVER
modelUri: gs://my-bucket/risk-model
replicas: 2
traffic: 100
When to use Seldon Core: When you have a custom ML model (not a foundation model) that needs to scale, version, and update independently.
In a distributed system, metering must still be centralised (at the AI Service). Services must not try to record their own token usage.
Rule: The AI Service is the sole writer to ai_usage_log. All other services are readers (for their own tenant/user data via the usage API).
finance-service ──┐
enrollment-service ├── POST /ai/complete → AI Service writes to ai_usage_log
report-service ──┘
admin-service → GET /ai/usage?tenant_id=42&period=2026-04 → AI Service reads usage
Usage API in the AI Service:
GET /ai/usage?tenant_id=42&period=2026-04&group_by=user
→ { "period": "2026-04", "users": [ { "user_id": 101, "calls": 82, "tokens": 14500, "cost_usd": 0.0234 } ] }
GET /ai/usage/tenants?period=2026-04 (super-admin only)
→ [ { "tenant_id": 42, "tier": "growth", "budget_usd": 10.00, "spent_usd": 3.21, "pct_used": 32.1 } ]
The AI Service is a critical dependency. Apply extra resilience:
503 immediately; async callers' jobs remain in queue and are processed when service recovers.402 Budget Exceeded — the budget is not a transient error.ai_usage_log before response returned./health endpoint checks provider reachability.See also:
ai-architecture-patterns — AI Module Gate, Budget Guard, Token Ledger detailai-metering-billing — Token ledger schema and billingmicroservices-resilience — Circuit breaker and health check implementationmicroservices-communication — Async queue pattern for AI jobsdata-ai
Use when adding AI-powered analytics to a SaaS platform — semantic search over business data, natural language queries, trend detection, anomaly alerts, and AI-generated insights for dashboards. Covers embeddings, NL2SQL, and per-tenant analytics...
data-ai
Design AI-powered analytics dashboards — what metrics to show, how to display AI predictions and confidence, drill-down patterns, KPI cards, trend visualisation, AI Insights panels, export design, and role-based dashboard variants. Invoke when...
development
Use when designing, building, reviewing, or upgrading production software systems that must be secure, performant, maintainable, scalable, and user-centered. Apply before writing specs, code, architecture, APIs, databases, mobile apps, SaaS platforms, or ERP systems.
development
Professional web app UI using commercial templates (Tabler/Bootstrap 5) with strong frontend design direction when needed. Use for CRUD interfaces, dashboards, admin panels with SweetAlert2, DataTables, Flatpickr. Clone seeder-page.php, use...