Performance & Benchmarks

Reproducible numbers: token efficiency vs React, layout throughput, and the frame profiler — all runnable with `vel bench`.

Vel makes three performance claims, and ships a benchmark for each so you can verify them yourself. Everything below is reproducible on your machine with one command.

All numbers were measured on an Apple M1 Pro. Absolute timings vary by machine — what’s stable is the ratio and the before/after. Run the suite locally with vel bench to get your own figures.

TL;DR

2.33× fewer tokens

Authoring a UI in .vel costs 2.33× fewer LLM tokens than the same UI in idiomatic React + Tailwind.

~65–93× faster layout

Text-measure caching cut warm relayout of a 1,000-row tree from 19.6 ms → 0.30 ms, and 10,000 rows from 206 ms → 2.2 ms.

~1 ms frame work

Steady-state CPU work per frame (layout + reactive update) for the showcase is ~1 ms p50 — vsync-bound headroom to spare.

1. Token efficiency — `vel bench token`

The headline reason Vel exists: an AI agent (or a human) should be able to describe a UI in as few tokens as possible. Fewer tokens = cheaper, faster, more reliable generation.

Method. Six canonical UIs are written twice — once in .vel, once in idiomatic React + Tailwind at equal fidelity — and tokenized with o200k_base (the closest public proxy for modern frontier tokenizers). Every .vel file in the corpus compiles (velc --check passes), so these are real programs, not strawmen.

UI	Vel tokens	React tokens	React / Vel
counter	108	258	2.39×
login	159	500	3.14×
pricing	409	801	1.96×
settings	307	829	2.70×
dashboard	275	594	2.16×
profile	223	476	2.13×
Total	1,481	3,458	2.33×

pip3 install tiktoken
vel bench token

Why this is a fair comparison

The React side is idiomatic React + Tailwind — including import, useState, and event handlers (real authored tokens).
Where React is genuinely terser (e.g. .map() over radio options) it gets that credit; the benchmark does not handicap React.
The React baseline is raw Tailwind, not a component library. A shadcn/ui baseline would narrow the gap on some cases — Vel’s built-in design system is part of the advantage, and we say so.

2. Layout throughput — `vel bench layout`

The js-framework-benchmark protocol applied to Vel’s core: build a tree of N rows (swatch + flexible label + action button) and measure tree construction and layout, with no window/GPU/vsync — pure CPU, directly comparable to React’s render + reconcile + layout.

Result (M1 Pro, median)

rows	build (tree)	layout (cold)	create	relayout (warm)	was (warm)
100	0.03 ms	0.04 ms	0.07 ms	0.04 ms	1.9 ms
1,000	0.26 ms	0.30 ms	0.55 ms	0.30 ms	19.6 ms
10,000	1.99 ms	2.22 ms	4.21 ms	2.23 ms	206 ms

vel bench layout

Tree construction was always fast — C++ allocation builds 10,000 rows of widgets in 2 ms, where JS createElement would be far slower. The bottleneck was layout re-measuring every glyph through FreeType on every frame.

The fix: text-measure caching

The win came from two process-lifetime caches in the text rasterizer (engine/src/text/FreeTypeRasterizer.cpp):

Per-glyph advance cache

Keyed on (face, pixelSize, codepoint). Common characters are loaded once, so even cold layout of varied text is fast.

Per-string width cache

Keyed on (face, pixelSize, string). Re-measuring unchanged text is now O(1) — the steady-state case for scrolling and animation.

Glyph advances never change for a given face + size, so the caches live for the process lifetime and never need invalidation. Result: warm relayout of 1,000 rows dropped 19.6 ms → 0.30 ms (~65×); 10,000 rows 206 ms → 2.2 ms (~93×). A 10k-row list now lays out comfortably inside a 60 fps frame — the steady-state interaction budget that Figma-class apps live in.

3. The frame profiler — `VEL_PERF`

Every Vel app has a built-in frame profiler. Set VEL_PERF and run any app:

VEL_PERF=60 ./build/showcase     # report every 60 frames

[vel-perf] n=60  build p50/p99=970/2600us  gpu p50/p99=7148/8262us  \
           frame p50/p95/p99/max=8195/9012/9304/9304us  ~122fps

build — measure + place + tick (layout + reactive update): clean CPU cost.
gpu — command encoding plus the vsync wait (so it’s not pure GPU time).
frame — the whole render pass.

On an M1 Pro the showcase runs vsync-bound at ~120 fps (ProMotion); the meaningful figure is build ≈ 1 ms p50 — comfortable headroom.

Reproduce everything

vel bench            # token + layout
vel bench token      # tokens vs React (needs: pip3 install tiktoken)
vel bench layout     # headless build + layout for 100 / 1k / 10k rows
VEL_PERF=60 vel run ./build/showcase   # live per-frame profile

The benchmark sources live in benchmarks/ with full methodology and honest caveats.

What’s still open

Performance work is never done. Known next steps: dirty-subtree layout (skip re-measuring unchanged subtrees), Flex intrinsic-size caching, and a startup/bundle benchmark for the web (WASM size, time-to-interactive) — Vel’s biggest remaining unknown vs React on the web.