In this presentation, we will explore the compilation of garbage-collected languages, such as Java or OCaml, to WebAssembly (Wasm). The limitations of JavaScript as the default language of the web led to the development of Wasm, a secure and predictable-performance modular language. However, compiling garbage-collected languages to Wasm presents challenges, including the need to compile or re-implement the runtime. Various techniques for representing values in memory are discussed, with a focus on OCaml's approach. An extension called Wasm-GC is introduced, enabling the compilation of garbage-collected languages to Wasm by incorporating features like int31 and garbage-collected structs. The paper presents Wasocaml, a complete OCaml compiler for Wasm-GC, and discusses benchmarks and future work in compiling garbage-collected languages to WebAssembly.
In this presentation, we explore the compilation of OCaml to WebAssembly (Wasm).
The limitations of JavaScript as the default language of the web led to the development of Wasm, a secure and predictable-performance modular language.
However, compiling garbage-collected languages to Wasm presents challenges, including the need to compile or re-implement the runtime.
Some Wasm extensions are developped by the Wasm working groups to ease the compilation of garbage-collected languages.
We present Wasocaml, an OCaml to Wasm-GC compiler.
Different strategies to map the OCaml value representation technique to WasmGC and our compilation scheme are detailed.
Finally, we describe how we plan to handle the C/JavaScript FFIs and effects handlers within Wasocaml.
\end{abstract}
% TODO: parler des effects handler à la fin de l'abstract
% TODO:
% I wish you said a little more about:
@ -154,7 +158,11 @@ The general rule for the Wasm comittee is to only include features
with a demonstrated use case. As there are currently very few
compilers targeting the GC-proposal, some features were lacking
conclusive evidence of their usefulness. An example is the \mintinline{wast}{i31ref}
type that is not required by the Dart compiler (the only one targeting the GC-proposal at the time). Wasocaml demonstrates the usefulness of \mintinline{wast}{i31ref}. It also validates the GC-proposal on a functional language. We presented Wasocaml to the Wasm-GC working group~\cite{AC23}. It helped in convincing the working group to keep \mintinline{wast}{i31ref} in the proposal.
type that is not required by the Dart compiler (the only one targeting the GC-proposal at the time).
Wasocaml demonstrates the usefulness of \mintinline{wast}{i31ref}.
It also validates the GC-proposal on a functional language.
We presented Wasocaml to the Wasm-GC working group~\cite{AC23}.
It helped in convincing the working group to keep \mintinline{wast}{i31ref} in the proposal.
% TODO: at this point, readers have no idea what the i31 ref type is
% the wording suggests that it's important
@ -197,7 +205,9 @@ is an opaque type representing a value from the embedder. References cannot be s
the linear memory of Wasm thus they cannot appear inside OCaml values when using the
previously described compilation scheme.
In order to use references, we require a completely different compilation strategy; we do not use the linear memory. Our strategy is close to the native OCaml one, which we describe now.
In order to use references, we require a completely different compilation strategy.
We do not use the linear memory.
Our strategy is close to the native OCaml one, which we describe now.
\subsection{Native OCaml Value Representation}
@ -222,7 +232,7 @@ If $b_{0} = 0$, then the whole value is a pointer:
\end{center}
\end{tcolorbox}
%TOOD: split in two (add accolade en bas à gauche pour montrer la taille du pointeur/scalaire et enlever la partie droite)
%TODO: split in two (add accolade en bas à gauche pour montrer la taille du pointeur/scalaire et enlever la partie droite)
If $b_{1}=1$, then the $n -1$ most significant bits are a small scalar and $b_{0}$ is ignored:
@ -329,10 +339,8 @@ Reading its value is implemented by getting the cell and casting it to an intege
(ref.cast $block (local.get $x))))))
\end{wast}
% TODO: clarify indice stuff:
Thus accessing the field $1$ of the OCaml block amounts
to accessing the field $2$ of the array:
Thus accessing the field $n$ of the OCaml block amounts
to accessing the field $n +1$ of the array:
\begin{wast}
(func $snd (param $x eqref) (result eqref)
@ -404,15 +412,19 @@ need recursive Wasm types.
We use the Flambda IR of the OCaml compiler as input for the Wasm generation.
This is a step of the compilation chain where most of the
high-level OCaml-specific optimisations are already applied. Also in
this IR, the closure conversion pass had already been performed. Most of the
constructions of this IR maps quite directly to Wasm ones:
this IR, the closure conversion pass is already performed. Most of the
constructions of this IR maps quite directly to Wasm ones.
\begin{itemize}
\item control flow and continuations have a direct equivalent with Wasm \mintinline{wast}{block}, \mintinline{wast}{loop}, \mintinline{wast}{br_table}, and \mintinline{wast}{if} instructions;
\item low level OCaml primitives to handle exceptions are quite similar to Wasm ones.
\end{itemize}
\subsection{Control flow}
% TODO: explain that in OCaml one can generate exceptions at runtime thus we can't use Wasm exn directly but have to use an identifier
Control flow and continuations have a direct equivalent with Wasm \mintinline{wast}{block},
\mintinline{wast}{loop}, \mintinline{wast}{br_table}, and \mintinline{wast}{if} instructions.
Low level OCaml primitives to handle exceptions are quite similar to Wasm ones.
In OCaml, it is possible to generate new exceptions at runtime by using \emph{e.g.} the
\mintinline{ocaml}{let exception} syntax or functors and first-class modules.
This is not possible in the Wasm exception proposal.
Thus, we use the same Wasm exception everywhere and manage the rest on the side by ourselves,
using an identifier to discriminate between different exceptions.
\subsection{Currification}
The main difference revolves around functions. In OCaml, functions
@ -500,6 +512,18 @@ we do not expect to be competitive with the native code
compiler, the performance degradation seems to be almost
constant (around twice slower).
% TODO: give a ratio for wasm/bytecode ??
We are able to compile an OCaml implementation of the
Knuth-Bendix algorithm~\cite{KB83}.
For now, it leads to runtime errors.
We have not been able to reproduce the error on a smaller case.
It is unclear wether our generated Wasm is incorrect or if we are
hitting a bug in the experimental V8 support for all Wasm extensions
we require.
% TODO: this is quite a minimal form of performance evaluation...
Compared to a JavaScript VM, a Wasm compiler is a much simpler beast
that can compile ahead of time. For this reason, various Wasm engines
are expected to behave quite similarly. They do not show any of the wild
@ -508,10 +532,6 @@ Indeed, compiling OCaml to JS using jsoo leads to results that
are usually also twice as slow as native code in the best cases, but
can sometimes be much slower in an unpredictable fashion.
% TODO: give a ratio for wasm/bytecode ??
% TODO: this is quite a minimal form of performance evaluation...
% TODO: XL likes knuth-bendix, can we run it ? are we still under the 2x slower bound ?
Currently there is no other Wasm runtime supporting all the extensions we require.
SpiderMonkey does not have tail-call. The reference interpreter implementation of
the various extensions are split in separate repository and merging them requires