x-ai·Mar 31, 2026

xAI: Grok 4.20

x-ai/grok-4.20

context

Max Output

—

Input / 1M

$1.25

Output / 1M

$2.50

About

Grok 4.20 is xAI's flagship model, emphasizing inference speed and agentic tool calling. xAI positions it for strict prompt adherence and a low hallucination rate; on the Artificial Analysis Omniscience test it reached a 78% non-hallucination rate, the highest among models tested, and ranked second on tau-squared-Bench Telecom with 97% for agentic tool use. Reasoning can be enabled or disabled via the reasoning enabled parameter in the API. The model accepts text, image, and file inputs and supports a 2,000,000-token context window.

Capabilities

Context Length: 2M
Max Output: —
Reasoning: Yes
In: text, image, file
Out: text

Benchmarks

View leaderboard

37.0#40 of 133

Intelligence Index

40.5#37 of 118

Coding Index

Reasoning & Knowledge

GPQA Diamond91.1%

HLE32.2%

Coding & Agentic

SciCode45.6%

Terminal-Bench Hard37.9%

Source: Artificial Analysis

Pricing

Full pricing

Type	≤200K	>200K
Input	$1.25	$2.50
Output	$2.50	$5.00
Cache Read	$0.20	$0.40

Web Search

$0.0050 / call

API

API reference

OpenAI-compatible · Model ID x-ai/grok-4.20

curl https://api.elliotgate.com/v1/chat/completions \
  -H "Authorization: Bearer sk-omg-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "x-ai/grok-4.20",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

OFTEN COMPARED

Grok 4.20 comparisons

Decide which model wins on the dimensions that matter for your workload — context, benchmarks, pricing, or serving latency.

xAI: Grok 4.20

About

Capabilities

Benchmarks

Reasoning & Knowledge

Coding & Agentic

Pricing

API

Grok 4.20 comparisons

Grok 4.20 vs MiMo-V2-Pro

Grok 4.20 vs GPT-5.2-Codex

Grok 4.20 vs MiniMax M2.7

Grok 4.20 vs GPT-5.4 Mini

Grok 4.20 vs GLM 5

Grok 4.20 vs Qwen3.6 Plus

Grok 4.20 vs GPT-5.1 Chat

Grok 4.20 vs GPT-5.1