Testing Firefox Wasm Tail Call
Or why assumptions are not always correct.
– Created: February 16, 2024 UTC
– Edited: July 28, 2024 UTC
– Tags: Optimization, Wasm, Interpreters
Lore
Interpreting comes at a cost, the more you nest - the more complex things become. It’s especially true on the web, where any user program already sits on layers and layers of interfaces. It gets pretty funny, I can’t even run ZX Spectrum emulator written in JavaScript with more than few frames a second.
A lot of software targeting the web has their own languages and interpreters (such as Godot and GDScript) and in realtime simulation intensive cases overheads do matter.
One of things that is often suggested for solving interpreter performance is tail calling
.
And it works emperically on native platforms. Check this post.
And so I wondered, could it work for Wasm platform? Firefox recently pushed support for experimental spec of it, after all.
Results
I based the test interpreter on fast-interpreters
post linked above.
Sources are available on github. It does nothing, but increments until 100000000,
which is relevant case for nothing, but instruction decoding, which we are testing here.
First, native:
time ./jump-table
real 0m3,094s
user 0m3,082s
sys 0m0,012s
time ./tail-call
real 0m2,491s
user 0m2,485s
sys 0m0,005s
Run time decrease of 19.3%
! Formidable.
But with web it’s more interesting:
tail-call.wasm (cold): 10874ms - timer ended
jump-table.wasm (cold): 6610ms - timer ended
Tail calls are actually slower in this case (by 39.2%
), which I’m not sure about why yet.
Intuition proven wrong, - but me testing it first proven useful :)
Note: I’m running it on amd64 cpu, stable Firefox 122.0, compiled with Zig’s Clang version 16.
Seems like JIT complation on the web is the way to go, to fold everything to Wasm bytecode. But overall with plain jump-table overheads are mere 113.6%, which I would say isn’t critical for a lot of cases, especially if interpreter is intended mostly as an interface adapter, which is the case with GDScript.