Gemini MCP
37+ tools bridging Google Gemini into Claude Code — text, image, video, audio, deep research, batch jobs, multi-turn image editing and prompt caching, all over stdio.
An MCP (Model Context Protocol) server that gives any MCP-aware client — primarily Claude Code — direct access to Google Gemini's capabilities. Beyond a thin wrapper, it exposes specialized features: multi-turn image editing sessions that retain context across calls, the asynchronous batch API at half the price, prompt caching, deep research mode, sandboxed code execution and a 30+ voice TTS catalog.
One binary runs in two modes: MCP server (default, stdio) or standalone CLI (when invoked with subcommands).
@modelcontextprotocol/sdk 1.22@google/genai 1.34dist/index.js with shebang (executable)Modular layers with a central registry: a tool-groups config maps preset names to register-functions; each tool module wires up server.tool() with a Zod schema and a Gemini API call.
src/ ├── index.ts Dual-mode dispatcher — MCP server (default) or CLI (gcli) ├── server.ts MCP init · tool registry · stdio transport │ 5-attempt exponential backoff reconnection ├── gemini-client.ts GoogleGenAI instance · 4 model slots │ (Pro · Flash · Image · Video) · ThinkingLevel ├── tool-groups.ts Maps 19 group IDs → register fns │ 6 presets: minimal · text · image · research · media · full ├── tools/ 25 modules · each calls server.tool() + Zod schema │ ├── query.ts gemini-query (Pro/Flash, thinking levels) │ ├── image-gen.ts generate-image, image-prompt │ ├── batch-image-gen.ts batch-generate · check · list · cancel │ ├── image-edit.ts start/continue/end-edit (multi-turn sessions) │ ├── video.ts generate-video, check-video (Veo 2.0) │ ├── research.ts search, deep-research, followup, check │ ├── analyze.ts images, documents, urls, tables, youtube │ ├── cache.ts create, query, list, delete │ ├── code.ts run-code (sandboxed) │ └── speech.ts speak, dialogue, list-voices, count-tokens └── utils/ Logger (verbose/quiet) · output-dir (XDG-aware)
Text & reasoning
query (Pro/Flash with minimal/low/medium/high thinking levels), brainstorm, analyze-code, analyze-text, summarize, structured (JSON extraction), extract.
Image generation & editing
Single-turn generate-image (4K, 10 aspect ratios), prompt optimizer, plus multi-turn edit sessions (start → continue → end) that maintain chat context.
Batch image gen — 50% cheaper
Submit 1–100 prompts asynchronously to Gemini's batch endpoint. Poll status, list, or cancel. Half the cost of real-time API.
Video
Veo 2.0 generate-video and async check-video for status polling.
Research & web
search (Google Search + AI summary), deep-research multi-step mode, follow-up questions, plus URL analysis (analyze, compare, extract-from).
Documents & media
analyze-document, summarize-pdf, extract-tables, youtube + youtube-summary.
Caching
create-cache (file upload + TTL), query-cache, list-caches, delete-cache. Cuts cost for repeat large-context queries.
Code & speech
run-code (sandboxed Python execution); speak (TTS, 30+ voices), dialogue (multi-voice), list-voices, count-tokens.
- 01
Claude Code dispatches
Tool name + args sent over stdio; Zod validates the schema before anything runs.
- 02
Server resolves handler
Tool registry maps name → handler function (registered at startup based on preset / enabled tools).
- 03
Gemini SDK call
genAI.models.generateContent()or batch endpoint. Auth, model selection, thinking level and caching headers configured per call. - 04
Response shaping
Images base64-decoded and saved to
~/.cache/gemini-mcp/output(XDG-aware path); text/JSON wrapped as structured MCP content. - 05
Return to Claude
{ content: [{ type: 'text' | 'image', ... }] }over stdio. Claude Code renders inline previews.
Multi-turn image-edit sessions
An activeEditSessions Map stores a chat instance + last image base64 per session. Each continue-edit reuses that context — no manual thought-signature management; the SDK chat API handles continuity.
Batch jobs at 50% off
Distinct tool group submits prompts to Gemini's async batch endpoint. activeBatchJobs Map tracks state for polling; jobs can be cancelled.
Prompt caching
Dedicated cache tools upload files with a TTL, then subsequent queries reuse the cached context. Massive savings for repeat large-document queries.
Tool presets reduce context
Six presets (minimal / text / image / research / media / full) selected via env var. Image preset omits text tools, research preset omits generation — fewer schemas in Claude's context.
Stdio reconnection
Transport detects closure and retries up to 5 times with exponential backoff (1 s → 10 s max) before giving up.
Dual-mode binary
Same dist/index.js runs as MCP server (no args) or standalone gcli CLI. Single artifact, two surfaces.
What I optimized for, and what each choice cost.
stdio over SSE/HTTP
stdio is the simplest, single-process MCP transport — works everywhere Claude Code runs, no port management. Cost: no remote sharing across machines; each user runs their own server.
Tool presets over loading all 37
Presets (minimal / image / research / full) keep Claude's context lean — fewer schemas means more usable context for actual work. Cost: users may not discover tools outside the preset they enabled.
In-memory edit sessions over persistent
Multi-turn image edit state lives in a Map. Cheap, fast, zero infra. Cost: server restart loses sessions; not durable across crashes.
Path-based registration over npm publish
Direct source deployment lets power users patch tool definitions and iterate fast. Cost: harder install for newcomers — there's no npx one-liner.
- Clone repo +
npm install --ignore-scripts npx tsc→dist/index.jsclaude mcp add gemini -s user -- node dist/index.js
GEMINI_API_KEYenv varGEMINI_PRO_MODEL·GEMINI_IMAGE_MODELoverridesGEMINI_TOOL_PRESETorGEMINI_ENABLED_TOOLS
Stage: Open source. Path-based registration in Claude Code; not yet npm/pip-published. Designed for direct source deployment so power users can patch tool definitions easily.
A clean MCP server is a clean integration layer — same skill set, different protocol.
Zod-validated tool schemas are the same primitive as strict input validation on Mendix microflows and REST endpoints — a contract enforced at the boundary, not trusted from the caller. The tool-registry pattern (group → register fn → schema → handler) maps directly to Mendix module and connector design. Environment-variable presets are exactly how enterprise apps configure per-environment behavior across dev / acceptance / production. The same person who writes a 37-tool MCP server with type-safe schemas also writes a robust SAP connector.