skills/gemini-files/SKILL.md
Upload and manage files using Google Gemini File API via scripts/. Use for uploading images, audio, video, PDFs, and other files for use with Gemini models. Supports file upload, status checking, and file management. Triggers on "upload file", "file API", "upload image", "upload PDF", "upload video", "file management".
npx skillsauth add akrindev/google-studio-skills gemini-filesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Upload and manage files for use with Gemini models through executable scripts, supporting images, audio, video, PDFs, and other file types.
Use this skill when you need to:
Purpose: Upload files to Gemini File API
When to use:
Key parameters:
| Parameter | Description | Example |
|-----------|-------------|---------|
| path | File path (required) | image.jpg |
| --name, -n | Display name | "my-document" |
| --wait, -w | Wait for processing | Flag |
Output: File name, URI, and status information
node scripts/upload.js image.jpg
node scripts/upload.js document.pdf --name "Quarterly Report Q4 2026"
node scripts/upload.js video.mp4 --wait
# 1. Upload image
node scripts/upload.js photo.png --name "product-shot"
# 2. Use with gemini-text for analysis
node skills/gemini-text/scripts/generate.js "Describe this image" --image photo.png
# 1. Upload PDF
node scripts/upload.js research-paper.pdf --name "AI-Research-Paper" --wait
# 2. Extract content with gemini-text
node skills/gemini-text/scripts/generate.js "Extract key findings from this document" --image research-paper.pdf
# 1. Upload multiple files
for file in *.jpg; do
node scripts/upload.js "$file"
done
# 2. Create batch job using uploaded files (gemini-batch skill)
# 1. Upload audio
node scripts/upload.js interview.mp3 --name "interview-001" --wait
# 2. Process with gemini-text (if transcription available)
node skills/gemini-text/scripts/generate.js "Transcribe and summarize this audio" --image interview.mp3
# 1. Upload video (may take time)
node scripts/upload.js product-demo.mp4 --name "demo-video" --wait
# 2. Analyze with gemini-text
node skills/gemini-text/scripts/generate.js "Analyze this product demo video" --image product-demo.mp4
| Type | Extensions | Max Size | Processing Time | |------|------------|----------|-----------------| | Images | jpg, jpeg, png, gif, webp | 20MB | Seconds | | Audio | mp3, wav, aac, flac | 25MB | Seconds-minutes | | Video | mp4, mov, avi, webm | 2GB | Minutes-hours | | Documents | pdf, txt | 50MB | Seconds-minutes |
Script auto-detects based on extension:
| State | Description | Ready for Use |
|-------|-------------|-----------------|
| PROCESSING | File is being analyzed | No |
| ACTIVE | File is ready | Yes |
| FAILED | Processing failed | No |
Uploading photo.png...
Uploaded: files/abc123...
URI: gs://generation-tmp/abc123...
State: PROCESSING
Uploading video.mp4...
Uploaded: files/xyz789...
URI: gs://generation-tmp/xyz789...
State: PROCESSING
Waiting for processing...
Still processing...
File ready!
Once uploaded, reference file by name:
# With gemini-text
node skills/gemini-text/scripts/generate.js "Analyze" --image <uploaded-file-path>
npm install @google/genai@latest dotenv@latest
--wait flag or check status later--wait for files you'll use immediately--wait for batch uploads to save time--name for organization# Basic upload
node scripts/upload.js image.jpg
# With custom name
node scripts/upload.js document.pdf --name "My Document"
# Wait for processing
node scripts/upload.js video.mp4 --wait
# Multiple files
for file in *.jpg; do node scripts/upload.js "$file"; done
While not in scripts, you can also manage files via JavaScript:
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// List all files
for await (const file of client.files.list()) {
console.log(`${file.name}: ${file.displayName} (${file.state})`);
}
// Get file info
const file = await client.files.get({ name: "files/abc123..." });
console.log(`State: ${file.state}`);
// Delete file
await client.files.delete({ name: "files/abc123..." });
development
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports multiple voices and streaming. Triggers on "text to speech", "TTS", "generate audio", "voice synthesis", "speak this text".
development
Generate text content using Google Gemini models via scripts/. Use for text generation, multimodal prompts with images, thinking mode for complex reasoning, JSON-formatted outputs, and Google Search grounding for real-time information. Triggers on "generate with gemini", "use gemini for text", "AI text generation", "multimodal prompt", "gemini thinking mode", "grounded response".
development
Generate images using Google Gemini and Imagen models via scripts/. Use for AI image generation, text-to-image, creating visuals from prompts, generating multiple images, custom aspect ratios, and high-resolution output up to 4K. Triggers on "generate image", "create image", "imagen", "text to image", "AI art", "nano banana".
development
Generate text embeddings using Gemini Embedding API via scripts/. Use for creating vector representations of text, semantic search, similarity matching, clustering, and RAG applications. Triggers on "embeddings", "semantic search", "vector search", "text similarity", "RAG", "retrieval".