You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

434 lines
17 KiB

\usepackage{fontspec} % encoding
\usepackage{hyperref} % internal and external links (e.g. mail, www)
\usepackage{authblk} % author affiliation handling
\usepackage{fancyvrb} % source code
\input{styledef.tex} % source code
\setmainfont{Linux Libertine O}
\setsansfont{Linux Biolinum O}
%\setsansfont{TeX Gyre Heros}
%% \setmonofont[Scale=MatchLowercase]{RobotoMono Nerd Font}
\title{Wasocaml: a compiler from OCaml to WebAssembly}
\author[1,2]{Léo Andrès}
\author[1]{Pierre Chambart}
\affil[1]{OCamlPro SAS, 21 rue de Châtillon, 75014 Paris, France}
\affil[2]{Université Paris-Saclay, CNRS, ENS Paris-Saclay, Inria, Laboratoire Méthodes Formelles, 91190 Gif-sur-Yvette, France}
In this presentation, we will explore the compilation of garbage-collected languages, such as Java or OCaml, to WebAssembly (Wasm). The limitations of JavaScript as the web's default language led to the development of Wasm, a secure and predictable-performance modular language. However, compiling garbage-collected languages to Wasm presents challenges, including the need to compile or re-implement the runtime. Various techniques for representing values in memory are discussed, with a focus on OCaml's approach. An extension called Wasm-GC is introduced, enabling the compilation of garbage-collected languages to Wasm by incorporating features like int31 and garbage-collected structs. The paper presents Wasocaml, a complete OCaml compiler for Wasm-GC, and discusses benchmarks and future work in compiling garbage-collected languages to WebAssembly.
Javascript is widely thought as being the de facto language of the
web. However, it does show its limitations when it comes to
performance, security and safety. In order to remedy this, WebAssembly
(Wasm) has been developped as a secure modular language of predictable
performance. Its usage is expanding beyond the web: finding
applications in the cloud (Fastly, Cloudflare), in the creation of
portable binaries.
Parler de JS\_of\_ocaml ici
The first version of WASM was meant to compile libraries in languages
such as C, C++ or Rust to expose them for consumption by a Javascript
program. It was designed with the explicit aim that future versions
would catter to the specific needs of other languages and uses.
The current version of WASM can be seen as simplified C.
%% (func $fact (param $x i32) (result i32)
%% (if (i32.eq (local.get $x) (i32.const 0)) (then (i32.const 1))
%% (else (i32.mul (local.get $x) (call $fact (i32.sub (local.get $x) (i32.const 1))))
%% ))
For instance a WASM program could only manipulate integers and floating
point values, not garbage collected ones, like Javascript
objects. This would have obviously been useful in a browser context,
but would have required some interaction with the GC of the runtime
This led to various extensions, in particular some that matter to
We will discuss the most notable ones, the reference, gc, and exception extensions
\subsubsection{References and GC}
The aim of those proposal is to allow the program to manipulate
references to different kind of values like, opaque objects, functions
and garbage collected values. Since WASM has to be memory safe, those
can't be exposed as direct pointers, hence there is a type system
verifying the proper manipulation of such values.
%% \texttt{
%% (type $f1 (func (param i32) (result i32)))
%% (type $t1 (array (ref i31))
%% (type $t2 (struct ((field $a (ref $t1)) (field $b (ref $f1)) (field $c i32))))
%% (func (param $x (ref $t2)) (result i32)
%% (i31.get_u (array.get $t1 (i32.const 0) (struct.get $t2 $a (local.get $x))))
%% }
But WASM being a compilation target rather than a programming
language, it needs to be able to represent the values of any kind of
source language. It was not deemed possible to have a type system
powerful enough to do that. Instead the decision was to have a simple
one, but to rely on dynamic cast to fill the gaps, but garanty that
those cast are very efficient.
Il n'y a que des struct des array des i31 des externref et des funcref
There is a subtyping type hierarchy that controls which cast are
allowed. Some types are predefined by WASM: any, eq, extern, func,
i31. Other are user defined: array and struct. There is a structural
subtyping relation between user ones. Upcasts are implicits, downcasts
are explicits and incorrect ones leads to runtime errors.
It is possible to dynamically test for type compatibility.
\paragraph{Contribution to the proposals}
The general rule for the WASM comittee is to only include features
with a demonstrated use case. As there are currently very few
compilers targeting the GC-proposal some features where lacking
conclusive evidence of their usefullness. In particular the i31
reference type which was not required by the Dart compiler which was
the only one targeting the GC-proposal at the time.
Wasocaml was written to validate the GC-proposal on a functionnal
language and among other things demonstrate the usefulness of i31.
Another serious limitation is the absence of an exception
mechanism. This was not a problem for compiling either C or Rust, but
C++ does require them. It is of course possible to encode those, but
this has a significative runtime cost and limits the interractions
with Javascript (that do have exceptions).
WASM 1 didn't include the possibility to tail-call functions. This is
obviously required for correct compilation of functionnal languages
\section{Compiling OCaml to WASM}
\subsection{Targeting WASM 1}
There have been attempts at targeting the early versions of WASM from OCAML.
The first obvious one being just compiling the OCaml runtime (including the GC) and interpreter using emscripten (a C to WASM compiler),
and WASICAML a smarter and more efficient version of this approach.
Both suffer from the same problem as C programs running on WASM: they
can't natively manipulate external values, like DOM objects in the
\subsection{With WASM-GC}
To properly interract with the embedder (TODO define before), we need to leverage the reference extensions.
This requires a completely different compilation strategy.
\subsubsection{Native OCaml value representation}
OCaml uses an uniform representation of values known as pointer
tagging. One bit is used to indicate if the value is a small scalar or
a pointer to a heap allocated block. Granted we are working on $n$
bits values, we have to choose one bit that will be the \emph{tag
bit}. For instance, if we choose the least significant bit, we start
by testing its value:
If $b_{0} = 0$, then the whole value is a pointer:
If $b_{1} = 1$, then the $n - 1$ most significant bits are a small scalar and $b_{0}$ is ignored:
In the second case, we talk about \emph{small scalars} instead of
scalars because we can only represents $2^{n - 1}$ values instead of
the $2^n$ that are representable when all bits are available. For
pointers, we do not lose anything as they need to be \emph{aligned}
anyway and the last bit is always zero.
\subsubsection{OCaml value representation in WASM}
As in native code we need a uniform representation. We of course can't
use integers, as they can't be used as pointers. We need to use a
reference type. The most general one would be anyref. But we can be a
bit more precise and use eqref. This is the type of all values that
can be tested for physical equality. It is a super type of OCaml
values. This also allows us to test for equality without requireing an
additionnal cast.
\paragraph{Small scalars}
All OCaml values that would be represented as integers in the native
OCaml are represented as i31ref. This includes the int and char types,
but also all constant constructors of sum types.
The OCaml array type is directly represented as a WASM array.
For other kinds of blocks there are two possible choices. We can
either use a struct with a field for the tag and one field per OCaml
value field. The other choice is to use arrays of eqref with the tag
stored at position 0.
\paragraph{Block as struct}
A block of size one would be represented using the type \$block1
% (type $block1 (struct (field $tag i8) (field $field0 (ref eq))))
And for size two it would be \$block2
% (type $block2 (struct (field $tag i8) (field $field0 (ref eq)) (field $field1 (ref eq))))
We need \$block2 to be a subtype of \$block1. As we are using upstream
OCaml as the base for Wasocaml. In the OCaml IRs the primitives for
accessing a block field is untyped, thus the when accessing the n-th
field of the block the only information is that the block has size at
least n+1.
%% (func snd (param $x (ref eq)) (result (ref eq))
%% (struct.get $block2 1 (ref.cast $block2 (local.get $x))))
OCaml blocks can be arbitrarilly large, and very large ones do appear
in practice. In particular modules are compiled as blocks, and tends
to be on the larger side. We have seen in the wild some examples
containing thousands of fields.
WASM allows a subtyping chain of length 64, which is far from
sufficient for this encoding. But a variant can be devised to
circumvent that limitation: It is possible to represent those as trees
of structs.
\paragraph{Block as arrays}
With the array representation defined as type \$block
%% (type $block (array (ref eq)))
accessign the field 1 of the OCaml block amount to accesing the field 2 of the array
%% (func snd (param $x (ref eq)) (result (ref eq))
%% (array.get $block (i32.const 2) (ref.cast $block (local.get $x))))
and reading the tag is implemented by reading the field 0 and casting to an integer
%% (func tag (param $x (ref eq)) (result i32)
%% (i31.get_u (ref.cast i31ref
%% (array.get $block (i32.const 0) (ref.cast $block (local.get $x))))))
\paragraph{Block representation tradeoff}
The array representation is simpler, but requires (implicit) bound checks at each
field access and a cast to read the tag.
While the struct representation requires a more complex cast for the
WASM runtime (a subtyping test). A compiler propagating more types
could use finer WASM type information, providing a precise type to
each struct field. This could allowing fewer casts.
Notice that in practice we could measure an upper bound on the actual
runtime cost of cast. The V8 runtime allows to consider cast as noop
(only for test purpose). The measured speedup is around 10\%
\paragraph{Boxed numbers}
Ocaml boxed numbers are structs containing the number
%% (type $float (struct (field f64)))
%% (type $int32 (struct (field i32)))
WASM ref func are functions, not closures, hence we need to produce
values containing both the function and its environment. The only WASM
type construction that can contain both func ref and other values are
the structs.
Hence a closure is a struct containing a func ref and the captured
%% (type $closure1 (struct (field (ref func)) (field $fv1 (ref eq)) (field $fv2 (ref eq))))
The actual representation is a bit more complex, to reduce casts and
handle mutualy recursive functions. This is the only place where we
need recursive WASM types.
We use as input for the WASM generation the flambda IR of the OCaml
compiler. This is a place in the compilation chain where most of the
high level OCaml specific optimisations are already applied. Also in
this IR, the closure conversion pass is alread done. Most of the
constructions of this IR maps quite directly to WASM ones:
\item Control flow and continuations have a direct equivalent with WASM blocks loops, br\_table and if
\item Low level OCaml exception constructs are almost indistinguishable from WASM ones
The main difference revolves around functions. In OCaml, the functions
have only one argument. But most of the functions practically, takes
more than one. Without any special management this would mean that
most of the code would be functions producting closures that would be
immediately applied. To handle that, internally OCaml do have
functions taking multiple arguments, with some special handling for
cases where they are partially applied. This information is explicit
at the Flambda level. The transformation handle that would normally
occur in the native OCaml compiler in a further pass, called cmmgen.
Hence we have to duplicate this for in Wasocaml. Compiling this
requires some kind of structural subtyping on closures such that
closures for functions of arity one are supertypes of all the other
closures. Thanksfully there are easy encodings for that in WASM.
%% TODO example ml partial apply
\paragraph{stack representation}
From that point, the translation to WASM is quite
straightforward. Most of the remaining work revolves around
translating from let bound expressions to a stack based language
without producing too naive code. Also, we don't need to care too much
about low level WASM specific optimisations as we rely on Binaryen
(a quite efficient WASM to WASM optimizer) for those.
%% TODO trouver une ref à citer pour binaryen
The main optimisation available in native OCaml that we are missing is
number unboxing. As OCaml values have an uniform representation, only
small scalars can fit in a direct OCaml value. This means that the
types int64, nativeint, int32 and float have to be boxed. In numerical
code, lots of intermediate values of type float are created, and in
that case, the allocation time of the box of numbers would completely
dominate the actual computation time. To limit that problem there is
an optimisation called unboxing performed during the cmmgen pass that
tries to eliminate most of the obviously useless allocations. As this
pass is performed after flambda, and was not required to produce a
complete working compiler, this was left for future work. Note that
the end plan is to use the next version of flambda, which does a much
better job for unboxing.
We have plans to provide ways to interract with both standard C
libraries using the usual OCaml FFI and Javascript libraries using
js\_of\_ocaml FFI.
Very recently, some extension to Clang where added that would allow
compiling C code with almost no change to the binding. We could
provide modified mlvalue.h headers files to use instead of the
standard ones replacing the usual macros with hand written WASM
functions. The only limitation that we forsee is that the Field macro
couldn't be an Lvalue anymore, a new Set\_field macro would be needed
instead (as was originaly proposed for OCaml 5)
To interract with Javascript code, we would need one more extension
(the string ref proposal).
Almost all external calls in jsoo goes through the
Js.Unsafe.meth\_call (of type 'a -> string -> any array -> 'b) function
which can be exposed to the WASM module as a function of type
%% (func (param $obj externref) (param $method stringref) (param $args $anyarray) (result externref))
This calls a method of name \$method on the object \$obj with the
arguments \$args . The javascript side is the one managing all the
dynamic typing.
\section{Expected performances}
We can't produce yet benchmarks on real sized programs. But
preliminary ones are quite convincing. While we of course don't expect
to be competitive with the native code compiler, the performance
degradation seems to be almost constant (around twice slower).
Compared to Javascript VM, a WASM compiler is a much simpler beast
that can compile ahead of time. There are none of the wild
impredictability that browser tend to demonstrate on Javascript. Also
various engines are expected to behave quite similarly.
\section{Future extensions}
Our compiler is base on OCaml 4.14. So effect handlers where out of
the scope. There are three strategies to handle them.
\paragraph{CPS compilation}
In an OCaml only program, with a whole program CPS transformation it
is possible to represent effect handlers as continuations. This
strategy has been proved to be effective by jsoo.
This is not ideal because this prevents the use of the target language
stack and instead stores it in the heap, with a non negligible
cost. It can be mitigated by only applying the transformation to the
code that can't be proved to not use effect handlers.
\paragraph{js promise}
\paragraph{WASM stack switching}
stack switching/effect handlers
\section{Related works other compilers}
js\_of\_ocaml / wasm\_of\_ocaml
\section{Current state}
Our current implementation only provides some part of the OCaml stdlib
externals as hand written WASM.
discuttions i31, subtyping chain, GC proposal ça marche bien pour nous