Numbers, not promises.

Reproducible head-to-head measurements between Relay and the main macOS terminals. Including the scenarios where Relay loses.

Hardware MacBook Pro · M3 Max · 32 GB
OS macOS 15 Sequoia
Measured 2026-05-26

Results

Six scenarios, each measured across multiple runs. Tinted row = Relay.

Cold start

lower is better

Time from app launch to a fully-rendered window. Warm-cache numbers — not the very first launch after install, but the typical day-to-day value.

Appmean (ms)min (ms)max (ms)
Relay
iTerm2
Ghostty
Terminal.app

Measurement pending. This dashboard is re-run with every Relay release.

RAM at idle

lower is better

Memory footprint of a single empty pane after 60 seconds of stabilization. Measures the baseline cost of each tool.

AppRSS (MB)processes
Relay
iTerm2
Ghostty
Terminal.app

Measurement pending. This dashboard is re-run with every Relay release.

RAM with 4 panes

lower is better

Memory footprint in a workspace with four active panes. This is the scenario Relay was built for — the question is how efficiently other terminals scale.

AppRSS (MB)per pane (MB)
Relay
iTerm2
Ghostty
Terminal.appno native splits

Measurement pending. This dashboard is re-run with every Relay release.

Throughput · 20 MB log dump

lower is better

How long the terminal takes to render a 20 MB log file. Stress test for the text pipeline.

Apptime (s)MB/s
Relay
iTerm2
Ghostty
Terminal.app

Measurement pending. This dashboard is re-run with every Relay release.

vtebench · VT-sequence processing

We lose to GPU terminals here

Industry-standard benchmark from the Alacritty project. Pure CPU work for the VT-sequence parser. We include it because honest comparison matters more than selective silence.

Appdense_cellslight_cellsscrollingunicode
Relay
iTerm2
Ghostty
Terminal.app

Measurement pending. This dashboard is re-run with every Relay release.

Where Relay loses

Three scenarios where other terminals objectively beat Relay. This section exists because a benchmark without an honest weakness list is worthless.

Methodology

Four decisions that shape every number in the tables above.

Warm cache, not cold boot

Cold-start values are measured after a warmup pass. Reflects daily use, not the first launch after a reboot.

App selection

Relay vs. iTerm2 vs. Ghostty vs. Terminal.app. Alacritty and Kitty are deliberately not in the default set — they serve a different audience (Linux-leaning, config-file workflow). Add them yourself in benchmark.sh.

Real zsh config

Each terminal launches with the test user's default zsh — not a stripped-down test shell. Numbers reflect real workflows.

Fully reproducible

Every script lives in the public repo. Every table above has a run date. You can re-run the measurements on your own hardware and compare with ours.

Run it yourself

If you don't trust our numbers — clone the repo and run the suite on your own machine. Takes about 15 minutes.

git clone https://github.com/relayapp/relay-benchmarks
cd relay-benchmarks
./benchmark.sh --check
./benchmark.sh

Published your own results? Send us the link and we'll link back. The more independent reproductions, the harder it is to accuse anyone of cherry-picking.