fast tier · 2024
Google DeepMind
Gemini 2.0 Flash
Speed-focused Gemini tier for high-traffic workloads with strong multimodal coverage.
Context window
1M tokens (streaming) / 128k cached context
Peak context for this model.
Availability
Google AI Studio, Vertex AI
Where you can run it.
Modalities
Text · Vision · Audio · Code
Input/output coverage.
Pricing
$0.10 / 1M input tokens, $0.40 / 1M output tokens
Latency: Very low; designed for real-time experiences
Strengths
- Very low latency with competitive reasoning for its size.
- Great at summarization, classification, and extraction tasks.
- Optimized streaming responses for interactive UIs.
Best for
- Support chat, quick Q&A, and transactional responses.
- Summaries and labeling over documents, tickets, and recordings.
- Agent warmups, pre-routing, and pre-processing before heavier calls.
Summary
- Tier: fast
- Release: 2024
- Latency: Very low; designed for real-time experiences
Other models