areas/software/mlops/skills/inference-serving/SKILL.md
# Skill: Inference Serving ## When to load When deploying a model to an API endpoint or optimizing inference latency. ## FastAPI Inference Endpoint ```python @app.on_event("startup") def load_model(): app.state.model = mlflow.pyfunc.load_model("models:/churn-predictor/Production") app.state.preprocessor = load_preprocessor() @app.post("/predict", response_model=PredictionResponse) def predict(request: PredictionRequest): try: features = app.state.preprocessor.transform([
npx skillsauth add sawrus/agent-guides areas/software/mlops/skills/inference-servingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When deploying a model to an API endpoint or optimizing inference latency.
@app.on_event("startup")
def load_model():
app.state.model = mlflow.pyfunc.load_model("models:/churn-predictor/Production")
app.state.preprocessor = load_preprocessor()
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
try:
features = app.state.preprocessor.transform([request.features])
probability = app.state.model.predict(features)[0]
log_prediction(request.user_id, request.features, float(probability))
return PredictionResponse(
user_id=request.user_id,
churn_probability=float(probability),
)
except Exception as e:
logger.error("Inference failed", error=str(e))
return PredictionResponse(user_id=request.user_id, churn_probability=FALLBACK_PROBABILITY)
testing
QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.
development
Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.
development
Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.
development
Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.