4032
model brief

Google DeepMind

fast tier · 2024

Google DeepMind

Gemini 2.0 Flash

Speed-focused Gemini tier for high-traffic workloads with strong multimodal coverage.

Context window

1M tokens (streaming) / 128k cached context

Peak context for this model.

Availability

Google AI Studio, Vertex AI

Where you can run it.

Modalities

Text · Vision · Audio · Code

Input/output coverage.

Pricing

$0.10 / 1M input tokens, $0.40 / 1M output tokens

Latency: Very low; designed for real-time experiences

Strengths

  • Very low latency with competitive reasoning for its size.
  • Great at summarization, classification, and extraction tasks.
  • Optimized streaming responses for interactive UIs.

Best for

  • Support chat, quick Q&A, and transactional responses.
  • Summaries and labeling over documents, tickets, and recordings.
  • Agent warmups, pre-routing, and pre-processing before heavier calls.

Summary

  • Tier: fast
  • Release: 2024
  • Latency: Very low; designed for real-time experiences