skills/vertex-inference/SKILL.md
<!-- Copyright 2026 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific l
npx skillsauth add GoogleCloudPlatform/vertex-ai-samples skills/vertex-inferenceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides instructions for authenticating and connecting to Google Cloud Vertex AI to use Generative AI models. It covers both First-Party (Gemini) and Third-Party (OpenMaaS) models.
[!TIP] Sample Scripts: This skill includes fully functional sample scripts in the
scripts/directory (e.g.,scripts/openmaas_openai_sdk.py). When running these scripts, ALWAYS create and use a local virtual environment:python3 -m venv .venv && source .venv/bin/activate pip install -r scripts/requirements.txtVerify All Scripts: You can run all scripts at once to verify your setup:
./scripts/verify_all.sh
[!IMPORTANT] CRITICAL: Model IDs & Availability
- Gemini Models: See Gemini Models for valid Model IDs and Regions.
- OpenMaaS Models: See Use Open Models on Vertex AI for Llama, DeepSeek, Qwen, etc.
- Incomplete Lists: The Model IDs listed in this skill are examples only and may be incomplete or outdated.
- Action: Always verify the Model ID and Region using the links above before generating code.
Before running any code, ensure you are authenticated with Application Default Credentials (ADC) and have the necessary API enabled.
gcloud auth application-default login
gcloud services enable aiplatform.googleapis.com
For Gemini models (e.g., gemini-2.5-pro, gemini-3-flash-preview), the GenAI SDK (google-genai) is the PREFERRED method. The legacy vertexai SDK is still supported but GenAI SDK is recommended for new projects.
[!IMPORTANT] Preview Models (including Gemini 3.1) are often ONLY available in the
globalregion. Stable models are available inus-central1and other regions.
google-genai) is PREFERRED. Use OpenAI SDK for compatibility, or Legacy SDK (vertexai) if needed.pip install google-genai
See scripts/gemini_genai_sdk.py for the complete code.
Use the standard OpenAI SDK with the Vertex AI endpoint. This is great for cross-compatibility.
See scripts/gemini_openai_sdk.py for the complete code.
The legacy vertexai SDK is still widely used but google-genai is preferred for new Gemini projects.
See scripts/gemini_vertexai_sdk.py for the complete code.
Documentation: Google GenAI SDK
Documentation: Vertex AI Gemini Models
For OpenMaaS (Model-as-a-Service) models, the HIGHLY RECOMMENDED approach is to use the standard OpenAI SDK with a specific Vertex AI endpoint.
[!WARNING] While
GenerativeModelcan support some OpenMaaS models, it is discouraged. Use the OpenAI SDK for best compatibility (especially for Chat Completions).
pip install openai google-auth
You MUST use a Google Cloud OAuth access token as the API key for the OpenAI SDK.
import google.auth
from google.auth.transport.requests import Request
def get_gcp_access_token():
creds, _ = google.auth.default()
creds.refresh(Request())
return creds.token
> [!NOTE]
> Google Cloud access tokens typically expire after 1 hour. The `get_gcp_access_token()` function above retrieves a *fresh* token at the time it is called.
> For long-running applications, you implement a refresh mechanism. See [Refresh the access token](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/openai-sdk-auth#refresh-token) for details.
https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapihttps://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapiSee scripts/openmaas_openai_sdk.py for the complete code.
[!TIP] Alternative: Environment Variables You can set environment variables in your shell instead of updating the code.
export OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi" export OPENAI_API_KEY="$(gcloud auth print-access-token)"Then initialize the client without arguments:
client = OpenAI()
The following models support the legacy Completions API: zai-org/glm-5-maas, moonshotai/kimi-k2-thinking-maas, minimaxai/minimax-m2-maas, deepseek-ai/deepseek-v3.1-maas, and deepseek-ai/deepseek-v3.2-maas.
response = client.completions.create(
model="deepseek-ai/deepseek-v3.2-maas",
prompt="Once upon a time",
max_tokens=100
)
print(response.choices[0].text)
# Verify specific Embedding Model ID on Model Garden (e.g., intfloat/multilingual-e5-small)
response = client.embeddings.create(
model="intfloat/multilingual-e5-large-maas",
input="The quick brown fox jumps over the lazy dog",
)
print(response.data[0].embedding)
The google-genai SDK can also access OpenMaaS models via the vertexai backend.
See scripts/openmaas_genai_sdk.py for the complete code.
[!IMPORTANT] Model ID Format: For GenAI SDK with OpenMaaS, you MUST use the full path:
publishers/PUBLISHER/models/MODEL(e.g.,publishers/zai-org/models/glm-5-maas).
For OpenMaaS, you can also use GenerativeModel (if supported).
See scripts/openmaas_vertexai_sdk.py for the complete code.
[!IMPORTANT] Model ID Format: For Vertex AI SDK with OpenMaaS, you MUST use the full path:
publishers/PUBLISHER/models/MODEL.
Documentation: Use Open Models on Vertex AI
[!TIP] Self-Deployment for Control: If you need dedicated hardware (GPUs/TPUs), guaranteed capacity, or specific regional placement not offered by MaaS, you can Self-Deploy these models to Vertex AI Endpoints. Search for the model in Model Garden and click "Deploy" to select your machine type.
[!IMPORTANT] Finding Inference Examples: The list above is a starting point. For the definitive inference snippets (especially for Chat Completions payload structure):
- Consult the Use Open Models on Vertex AI list.
- Click the link for your specific model (e.g., "DeepSeek-V3") to visit its Model Garden page.
- Look for the "Sample Code" or "Use this model" button on the Model Garden page to get the exact
curlor Python code for that specific model version.
[!NOTE] This list is INCOMPLETE. See Use Open Models on Vertex AI for the full list of supported models.
| Model Family | Model ID Examples | Location | Notes |
| :--- | :--- | :--- | :--- |
| Llama 4 | meta/llama-4-maverick-17b-128e-instruct-maas | us-east5 | |
| Llama 4 | meta/llama-4-scout-17b-16e-instruct-maas | us-east5 | |
| Llama 3.3 | meta/llama-3.3-70b-instruct-maas | us-central1 | |
| DeepSeek | deepseek-ai/deepseek-v3.2-maas | global | Global ONLY |
| DeepSeek | deepseek-ai/deepseek-v3.1-maas | us-west2 | US-West2 ONLY |
| DeepSeek | deepseek-ai/deepseek-r1-0528-maas | us-central1 | |
| Qwen 3 | qwen/qwen3-coder-480b-a35b-instruct-maas | global | |
| Qwen 3 | qwen/qwen3-next-80b-a3b-instruct-maas | global | |
| Kimi | moonshotai/kimi-k2-thinking-maas | global | |
| MiniMax | minimaxai/minimax-m2-maas | global | |
| GLM | zai-org/glm-4.7-maas, zai-org/glm-5-maas | global | |
us-central1 or global regions.us-central1, europe-west4, and many other regions.us-central1 or global.development
<!-- Copyright 2026 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific l
development
<!-- Copyright 2026 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific l
development
<!-- Copyright 2026 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific l
development
<!-- Copyright 2026 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific l