Testing Firefox Wasm Tail Call

Or why assumptions are not always correct.

– Created: February 16, 2024 UTC

– Edited: January 24, 2025 UTC

– Tags: Optimization, Wasm, Interpreters

Lore

Interpreting comes at a cost, the more you nest - the more complex things become. It’s especially true on the web, where any user program already sits on layers and layers of interfaces. It gets pretty funny, I can’t even run ZX Spectrum emulator written in JavaScript with more than few frames a second.

A lot of software targeting the web has their own languages and interpreters (such as Godot and GDScript) and in realtime simulation intensive cases overheads do matter.

One of things that is often suggested for solving interpreter performance is tail calling. And it works emperically on native platforms. Check this post.

And so I wondered, could it work for Wasm platform? Firefox recently pushed support for experimental spec of it, after all.

Results

I based the test interpreter on fast-interpreters post linked above. Sources are available on github. It does nothing, but increments until 100000000, which is relevant case for nothing, but instruction decoding, which we are testing here.

First, native:

time ./jump-table

real    0m3,094s
user    0m3,082s
sys     0m0,012s

time ./tail-call

real    0m2,491s
user    0m2,485s
sys     0m0,005s

Run time decrease of 19.3%! Formidable.

But with web it’s more interesting:

tail-call.wasm (cold): 10874ms - timer ended

jump-table.wasm (cold): 6610ms - timer ended

Tail calls are actually slower in this case (by 39.2%), which I’m not sure about why yet. Intuition proven wrong, - but me testing it first proven useful :)

Note: I’m running it on amd64 cpu, stable Firefox 122.0, compiled with Zig’s Clang version 16.

Seems like JIT complation on the web is the way to go, to fold everything to Wasm bytecode. But overall with plain jump-table overheads are mere 113.6%, which I would say isn’t critical for a lot of cases, especially if interpreter is intended mostly as an interface adapter, which is the case with GDScript.