Open Source

Gemini MCP

37+ tools bridging Google Gemini into Claude Code — text, image, video, audio, deep research, batch jobs, multi-turn image editing and prompt caching, all over stdio.

Demonstrates Integration architecture with type-safe contracts — Zod-validated tool schemas, modular registration, env-driven presets, multi-turn session state.

~/dev/mcp-servers/gemini-mcp — node dist/index.js

$ claude mcp add gemini -s user -- node dist/index.js → MCP server: gemini · stdio [gemini] registered 37 tools across 19 groups [gemini] preset: full [gemini] models: pro=gemini-3-pro · image=nano-banana-pro · video=veo-2.0 // Tool groups text query · brainstorm · summarize · structured · extract image-gen generate · image-prompt · start/continue/end-edit batch batch-generate · check · list · cancel // 50% cheaper video generate · check (Veo 2.0) research search · deep-research · followup · check media analyze-image · analyze-document · youtube · tables cache create · query · list · delete code run-code (sandboxed) speech speak · dialogue · list-voices · count-tokens [gemini] ready · stdio reconnect: 5 retries · exp backoff 1s → 10s

View on GitHub ↗

37⁺MCP tools

19Tool groups

50^%Batch cost savings

6Tool presets

Overview

An MCP (Model Context Protocol) server that gives any MCP-aware client — primarily Claude Code — direct access to Google Gemini's capabilities. Beyond a thin wrapper, it exposes specialized features: multi-turn image editing sessions that retain context across calls, the asynchronous batch API at half the price, prompt caching, deep research mode, sandboxed code execution and a 30+ voice TTS catalog.

One binary runs in two modes: MCP server (default, stdio) or standalone CLI (when invoked with subcommands).

Stack

LanguageTypeScript 5.3 (ESM) · Node ≥18 · Bun ≥1.2 for build/dev

MCP SDK@modelcontextprotocol/sdk 1.22

Gemini SDK@google/genai 1.34

Transportstdio (standard MCP)

ValidationZod 3.24 + zod-to-json-schema for tool argument schemas

Buildtsc → dist/index.js with shebang (executable)

Repogithub.com/Borys520/gemini-mcp-server

Architecture

Modular layers with a central registry: a tool-groups config maps preset names to register-functions; each tool module wires up server.tool() with a Zod schema and a Gemini API call.

src/
├── index.ts            Dual-mode dispatcher — MCP server (default) or CLI (gcli)
├── server.ts           MCP init · tool registry · stdio transport
│                       5-attempt exponential backoff reconnection
├── gemini-client.ts    GoogleGenAI instance · 4 model slots
│                       (Pro · Flash · Image · Video) · ThinkingLevel
├── tool-groups.ts      Maps 19 group IDs → register fns
│                       6 presets: minimal · text · image · research · media · full
├── tools/              25 modules · each calls server.tool() + Zod schema
│   ├── query.ts        gemini-query (Pro/Flash, thinking levels)
│   ├── image-gen.ts    generate-image, image-prompt
│   ├── batch-image-gen.ts  batch-generate · check · list · cancel
│   ├── image-edit.ts   start/continue/end-edit (multi-turn sessions)
│   ├── video.ts        generate-video, check-video (Veo 2.0)
│   ├── research.ts     search, deep-research, followup, check
│   ├── analyze.ts      images, documents, urls, tables, youtube
│   ├── cache.ts        create, query, list, delete
│   ├── code.ts         run-code (sandboxed)
│   └── speech.ts       speak, dialogue, list-voices, count-tokens
└── utils/              Logger (verbose/quiet) · output-dir (XDG-aware)

Tool catalog

Text & reasoning

query (Pro/Flash with minimal/low/medium/high thinking levels), brainstorm, analyze-code, analyze-text, summarize, structured (JSON extraction), extract.

Image generation & editing

Single-turn generate-image (4K, 10 aspect ratios), prompt optimizer, plus multi-turn edit sessions (start → continue → end) that maintain chat context.

Batch image gen — 50% cheaper

Submit 1–100 prompts asynchronously to Gemini's batch endpoint. Poll status, list, or cancel. Half the cost of real-time API.

Video

Veo 2.0 generate-video and async check-video for status polling.

Research & web

search (Google Search + AI summary), deep-research multi-step mode, follow-up questions, plus URL analysis (analyze, compare, extract-from).

Documents & media

analyze-document, summarize-pdf, extract-tables, youtube + youtube-summary.

Caching

create-cache (file upload + TTL), query-cache, list-caches, delete-cache. Cuts cost for repeat large-context queries.

Code & speech

run-code (sandboxed Python execution); speak (TTS, 30+ voices), dialogue (multi-voice), list-voices, count-tokens.

Tool call flow

01
Claude Code dispatches
Tool name + args sent over stdio; Zod validates the schema before anything runs.
02
Server resolves handler
Tool registry maps name → handler function (registered at startup based on preset / enabled tools).
03
Gemini SDK call
genAI.models.generateContent() or batch endpoint. Auth, model selection, thinking level and caching headers configured per call.
04
Response shaping
Images base64-decoded and saved to ~/.cache/gemini-mcp/output (XDG-aware path); text/JSON wrapped as structured MCP content.
05
Return to Claude
{ content: [{ type: 'text' | 'image', ... }] } over stdio. Claude Code renders inline previews.

Notable engineering

Multi-turn image-edit sessions

An activeEditSessions Map stores a chat instance + last image base64 per session. Each continue-edit reuses that context — no manual thought-signature management; the SDK chat API handles continuity.

Batch jobs at 50% off

Distinct tool group submits prompts to Gemini's async batch endpoint. activeBatchJobs Map tracks state for polling; jobs can be cancelled.

Prompt caching

Dedicated cache tools upload files with a TTL, then subsequent queries reuse the cached context. Massive savings for repeat large-document queries.

Tool presets reduce context

Six presets (minimal / text / image / research / media / full) selected via env var. Image preset omits text tools, research preset omits generation — fewer schemas in Claude's context.

Stdio reconnection

Transport detects closure and retries up to 5 times with exponential backoff (1 s → 10 s max) before giving up.

Dual-mode binary

Same dist/index.js runs as MCP server (no args) or standalone gcli CLI. Single artifact, two surfaces.

Trade-offs

What I optimized for, and what each choice cost.

stdio over SSE/HTTP

stdio is the simplest, single-process MCP transport — works everywhere Claude Code runs, no port management. Cost: no remote sharing across machines; each user runs their own server.

Tool presets over loading all 37

Presets (minimal / image / research / full) keep Claude's context lean — fewer schemas means more usable context for actual work. Cost: users may not discover tools outside the preset they enabled.

In-memory edit sessions over persistent

Multi-turn image edit state lives in a Map. Cheap, fast, zero infra. Cost: server restart loses sessions; not durable across crashes.

Path-based registration over npm publish

Direct source deployment lets power users patch tool definitions and iterate fast. Cost: harder install for newcomers — there's no npx one-liner.

Distribution

Install

Clone repo + npm install --ignore-scripts
npx tsc → dist/index.js
claude mcp add gemini -s user -- node dist/index.js

Configure

GEMINI_API_KEY env var
GEMINI_PRO_MODEL · GEMINI_IMAGE_MODEL overrides
GEMINI_TOOL_PRESET or GEMINI_ENABLED_TOOLS

Stage: Open source. Path-based registration in Claude Code; not yet npm/pip-published. Designed for direct source deployment so power users can patch tool definitions easily.

Crossover to enterprise work

A clean MCP server is a clean integration layer — same skill set, different protocol.

Zod-validated tool schemas are the same primitive as strict input validation on Mendix microflows and REST endpoints — a contract enforced at the boundary, not trusted from the caller. The tool-registry pattern (group → register fn → schema → handler) maps directly to Mendix module and connector design. Environment-variable presets are exactly how enterprise apps configure per-environment behavior across dev / acceptance / production. The same person who writes a 37-tool MCP server with type-safe schemas also writes a robust SAP connector.