4032
model brief

Meta

open-weight tier · 2024

Meta

Llama 3.2 90B

Open-weight Llama 3.2 model with strong reasoning for an open license footprint.

Context window

128k tokens

Peak context for this model.

Availability

Self-hosted, cloud marketplaces, supported by major GPU providers

Where you can run it.

Modalities

Text · Code

Input/output coverage.

Pricing

Open-weight (no per-token licensing)

Latency: Varies by host; scales across GPU clusters

Strengths

  • High quality for an open-weight model with competitive reasoning.
  • Supports fine-tuning and RAG pipelines on self-hosted infra.
  • Transparent licensing for on-prem or VPC deployments.

Best for

  • Teams that need vendor-neutral, controllable deployments.
  • Private RAG stacks with custom tuning and observability.
  • Cost-controlled batch inference across dedicated GPUs.

Summary

  • Tier: open-weight
  • Release: 2024
  • Latency: Varies by host; scales across GPU clusters