You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
434 lines
17 KiB
434 lines
17 KiB
\documentclass{article}
|
|
|
|
\usepackage[english]{babel}
|
|
|
|
\usepackage{fontspec} % encoding
|
|
\usepackage{hyperref} % internal and external links (e.g. mail, www)
|
|
\usepackage{authblk} % author affiliation handling
|
|
\renewcommand\Affilfont{\itshape\small}
|
|
\usepackage{bbding}
|
|
\usepackage{fancyvrb} % source code
|
|
\input{styledef.tex} % source code
|
|
\usepackage[most]{tcolorbox}
|
|
\setmainfont{Linux Libertine O}
|
|
\setsansfont{Linux Biolinum O}
|
|
%\setsansfont{TeX Gyre Heros}
|
|
%% \setmonofont[Scale=MatchLowercase]{RobotoMono Nerd Font}
|
|
|
|
\title{Wasocaml: a compiler from OCaml to WebAssembly}
|
|
|
|
\author[1,2]{Léo Andrès}
|
|
\author[1]{Pierre Chambart}
|
|
|
|
\affil[1]{OCamlPro SAS, 21 rue de Châtillon, 75014 Paris, France}
|
|
\affil[2]{Université Paris-Saclay, CNRS, ENS Paris-Saclay, Inria, Laboratoire Méthodes Formelles, 91190 Gif-sur-Yvette, France}
|
|
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
In this presentation, we will explore the compilation of garbage-collected languages, such as Java or OCaml, to WebAssembly (Wasm). The limitations of JavaScript as the web's default language led to the development of Wasm, a secure and predictable-performance modular language. However, compiling garbage-collected languages to Wasm presents challenges, including the need to compile or re-implement the runtime. Various techniques for representing values in memory are discussed, with a focus on OCaml's approach. An extension called Wasm-GC is introduced, enabling the compilation of garbage-collected languages to Wasm by incorporating features like int31 and garbage-collected structs. The paper presents Wasocaml, a complete OCaml compiler for Wasm-GC, and discusses benchmarks and future work in compiling garbage-collected languages to WebAssembly.
|
|
\end{abstract}
|
|
|
|
\section{Context}
|
|
\subsection{WASM}
|
|
Javascript is widely thought as being the de facto language of the
|
|
web. However, it does show its limitations when it comes to
|
|
performance, security and safety. In order to remedy this, WebAssembly
|
|
(Wasm) has been developped as a secure modular language of predictable
|
|
performance. Its usage is expanding beyond the web: finding
|
|
applications in the cloud (Fastly, Cloudflare), in the creation of
|
|
portable binaries.
|
|
|
|
Parler de JS\_of\_ocaml ici
|
|
|
|
The first version of WASM was meant to compile libraries in languages
|
|
such as C, C++ or Rust to expose them for consumption by a Javascript
|
|
program. It was designed with the explicit aim that future versions
|
|
would catter to the specific needs of other languages and uses.
|
|
|
|
The current version of WASM can be seen as simplified C.
|
|
|
|
%% (func $fact (param $x i32) (result i32)
|
|
%% (if (i32.eq (local.get $x) (i32.const 0)) (then (i32.const 1))
|
|
%% (else (i32.mul (local.get $x) (call $fact (i32.sub (local.get $x) (i32.const 1))))
|
|
%% ))
|
|
|
|
For instance a WASM program could only manipulate integers and floating
|
|
point values, not garbage collected ones, like Javascript
|
|
objects. This would have obviously been useful in a browser context,
|
|
but would have required some interaction with the GC of the runtime
|
|
environment.
|
|
|
|
This led to various extensions, in particular some that matter to
|
|
OCaml.
|
|
|
|
\subsection{Extensions}
|
|
|
|
We will discuss the most notable ones, the reference, gc, and exception extensions
|
|
|
|
\subsubsection{References and GC}
|
|
|
|
The aim of those proposal is to allow the program to manipulate
|
|
references to different kind of values like, opaque objects, functions
|
|
and garbage collected values. Since WASM has to be memory safe, those
|
|
can't be exposed as direct pointers, hence there is a type system
|
|
verifying the proper manipulation of such values.
|
|
|
|
%% \texttt{
|
|
%% (type $f1 (func (param i32) (result i32)))
|
|
%% (type $t1 (array (ref i31))
|
|
%% (type $t2 (struct ((field $a (ref $t1)) (field $b (ref $f1)) (field $c i32))))
|
|
|
|
%% (func (param $x (ref $t2)) (result i32)
|
|
%% (i31.get_u (array.get $t1 (i32.const 0) (struct.get $t2 $a (local.get $x))))
|
|
%% }
|
|
|
|
But WASM being a compilation target rather than a programming
|
|
language, it needs to be able to represent the values of any kind of
|
|
source language. It was not deemed possible to have a type system
|
|
powerful enough to do that. Instead the decision was to have a simple
|
|
one, but to rely on dynamic cast to fill the gaps, but garanty that
|
|
those cast are very efficient.
|
|
|
|
Il n'y a que des struct des array des i31 des externref et des funcref
|
|
|
|
There is a subtyping type hierarchy that controls which cast are
|
|
allowed. Some types are predefined by WASM: any, eq, extern, func,
|
|
i31. Other are user defined: array and struct. There is a structural
|
|
subtyping relation between user ones. Upcasts are implicits, downcasts
|
|
are explicits and incorrect ones leads to runtime errors.
|
|
It is possible to dynamically test for type compatibility.
|
|
|
|
\paragraph{Contribution to the proposals}
|
|
|
|
The general rule for the WASM comittee is to only include features
|
|
with a demonstrated use case. As there are currently very few
|
|
compilers targeting the GC-proposal some features where lacking
|
|
conclusive evidence of their usefullness. In particular the i31
|
|
reference type which was not required by the Dart compiler which was
|
|
the only one targeting the GC-proposal at the time.
|
|
|
|
Wasocaml was written to validate the GC-proposal on a functionnal
|
|
language and among other things demonstrate the usefulness of i31.
|
|
|
|
\subsubsection{Exception}
|
|
|
|
Another serious limitation is the absence of an exception
|
|
mechanism. This was not a problem for compiling either C or Rust, but
|
|
C++ does require them. It is of course possible to encode those, but
|
|
this has a significative runtime cost and limits the interractions
|
|
with Javascript (that do have exceptions).
|
|
|
|
\subsubsection{Tail-calls}
|
|
|
|
WASM 1 didn't include the possibility to tail-call functions. This is
|
|
obviously required for correct compilation of functionnal languages
|
|
|
|
\section{Compiling OCaml to WASM}
|
|
|
|
\subsection{Targeting WASM 1}
|
|
|
|
There have been attempts at targeting the early versions of WASM from OCAML.
|
|
The first obvious one being just compiling the OCaml runtime (including the GC) and interpreter using emscripten (a C to WASM compiler),
|
|
and WASICAML a smarter and more efficient version of this approach.
|
|
|
|
Both suffer from the same problem as C programs running on WASM: they
|
|
can't natively manipulate external values, like DOM objects in the
|
|
browser.
|
|
|
|
\subsection{With WASM-GC}
|
|
|
|
To properly interract with the embedder (TODO define before), we need to leverage the reference extensions.
|
|
This requires a completely different compilation strategy.
|
|
|
|
\subsubsection{Native OCaml value representation}
|
|
|
|
OCaml uses an uniform representation of values known as pointer
|
|
tagging. One bit is used to indicate if the value is a small scalar or
|
|
a pointer to a heap allocated block. Granted we are working on $n$
|
|
bits values, we have to choose one bit that will be the \emph{tag
|
|
bit}. For instance, if we choose the least significant bit, we start
|
|
by testing its value:
|
|
|
|
\begin{center}
|
|
\includegraphics[width=10em]{figure_ocaml_bit1.mps}
|
|
\end{center}
|
|
|
|
If $b_{0} = 0$, then the whole value is a pointer:
|
|
|
|
\begin{center}
|
|
\includegraphics[width=20em]{figure_ocaml_bit2.mps}
|
|
\end{center}
|
|
|
|
If $b_{1} = 1$, then the $n - 1$ most significant bits are a small scalar and $b_{0}$ is ignored:
|
|
|
|
\begin{center}
|
|
\includegraphics[width=20em]{figure_ocaml_bit3.mps}
|
|
\end{center}
|
|
|
|
In the second case, we talk about \emph{small scalars} instead of
|
|
scalars because we can only represents $2^{n - 1}$ values instead of
|
|
the $2^n$ that are representable when all bits are available. For
|
|
pointers, we do not lose anything as they need to be \emph{aligned}
|
|
anyway and the last bit is always zero.
|
|
|
|
\begin{tcolorbox}[breakable]
|
|
\input{layout.ml.tex}
|
|
\end{tcolorbox}
|
|
|
|
\begin{center}
|
|
\includegraphics[width=13em]{figure_ocaml_layout.mps}
|
|
\end{center}
|
|
|
|
\subsubsection{OCaml value representation in WASM}
|
|
|
|
As in native code we need a uniform representation. We of course can't
|
|
use integers, as they can't be used as pointers. We need to use a
|
|
reference type. The most general one would be anyref. But we can be a
|
|
bit more precise and use eqref. This is the type of all values that
|
|
can be tested for physical equality. It is a super type of OCaml
|
|
values. This also allows us to test for equality without requireing an
|
|
additionnal cast.
|
|
|
|
\paragraph{Small scalars}
|
|
|
|
All OCaml values that would be represented as integers in the native
|
|
OCaml are represented as i31ref. This includes the int and char types,
|
|
but also all constant constructors of sum types.
|
|
|
|
\paragraph{Arrays}
|
|
|
|
The OCaml array type is directly represented as a WASM array.
|
|
|
|
\paragraph{Blocks}
|
|
|
|
For other kinds of blocks there are two possible choices. We can
|
|
either use a struct with a field for the tag and one field per OCaml
|
|
value field. The other choice is to use arrays of eqref with the tag
|
|
stored at position 0.
|
|
|
|
\paragraph{Block as struct}
|
|
|
|
A block of size one would be represented using the type \$block1
|
|
|
|
% (type $block1 (struct (field $tag i8) (field $field0 (ref eq))))
|
|
|
|
And for size two it would be \$block2
|
|
|
|
% (type $block2 (struct (field $tag i8) (field $field0 (ref eq)) (field $field1 (ref eq))))
|
|
|
|
We need \$block2 to be a subtype of \$block1. As we are using upstream
|
|
OCaml as the base for Wasocaml. In the OCaml IRs the primitives for
|
|
accessing a block field is untyped, thus the when accessing the n-th
|
|
field of the block the only information is that the block has size at
|
|
least n+1.
|
|
|
|
%% (func snd (param $x (ref eq)) (result (ref eq))
|
|
%% (struct.get $block2 1 (ref.cast $block2 (local.get $x))))
|
|
|
|
OCaml blocks can be arbitrarilly large, and very large ones do appear
|
|
in practice. In particular modules are compiled as blocks, and tends
|
|
to be on the larger side. We have seen in the wild some examples
|
|
containing thousands of fields.
|
|
|
|
WASM allows a subtyping chain of length 64, which is far from
|
|
sufficient for this encoding. But a variant can be devised to
|
|
circumvent that limitation: It is possible to represent those as trees
|
|
of structs.
|
|
|
|
\paragraph{Block as arrays}
|
|
|
|
With the array representation defined as type \$block
|
|
|
|
%% (type $block (array (ref eq)))
|
|
|
|
accessign the field 1 of the OCaml block amount to accesing the field 2 of the array
|
|
|
|
%% (func snd (param $x (ref eq)) (result (ref eq))
|
|
%% (array.get $block (i32.const 2) (ref.cast $block (local.get $x))))
|
|
|
|
and reading the tag is implemented by reading the field 0 and casting to an integer
|
|
|
|
%% (func tag (param $x (ref eq)) (result i32)
|
|
%% (i31.get_u (ref.cast i31ref
|
|
%% (array.get $block (i32.const 0) (ref.cast $block (local.get $x))))))
|
|
|
|
\paragraph{Block representation tradeoff}
|
|
|
|
The array representation is simpler, but requires (implicit) bound checks at each
|
|
field access and a cast to read the tag.
|
|
|
|
While the struct representation requires a more complex cast for the
|
|
WASM runtime (a subtyping test). A compiler propagating more types
|
|
could use finer WASM type information, providing a precise type to
|
|
each struct field. This could allowing fewer casts.
|
|
|
|
Notice that in practice we could measure an upper bound on the actual
|
|
runtime cost of cast. The V8 runtime allows to consider cast as noop
|
|
(only for test purpose). The measured speedup is around 10\%
|
|
|
|
\paragraph{Boxed numbers}
|
|
|
|
Ocaml boxed numbers are structs containing the number
|
|
|
|
%% (type $float (struct (field f64)))
|
|
%% (type $int32 (struct (field i32)))
|
|
|
|
\paragraph{Closures}
|
|
|
|
WASM ref func are functions, not closures, hence we need to produce
|
|
values containing both the function and its environment. The only WASM
|
|
type construction that can contain both func ref and other values are
|
|
the structs.
|
|
|
|
Hence a closure is a struct containing a func ref and the captured
|
|
variables
|
|
|
|
%% (type $closure1 (struct (field (ref func)) (field $fv1 (ref eq)) (field $fv2 (ref eq))))
|
|
|
|
The actual representation is a bit more complex, to reduce casts and
|
|
handle mutualy recursive functions. This is the only place where we
|
|
need recursive WASM types.
|
|
|
|
\subsubsection{compilation}
|
|
|
|
We use as input for the WASM generation the flambda IR of the OCaml
|
|
compiler. This is a place in the compilation chain where most of the
|
|
high level OCaml specific optimisations are already applied. Also in
|
|
this IR, the closure conversion pass is alread done. Most of the
|
|
constructions of this IR maps quite directly to WASM ones:
|
|
|
|
\begin{itemize}
|
|
\item Control flow and continuations have a direct equivalent with WASM blocks loops, br\_table and if
|
|
\item Low level OCaml exception constructs are almost indistinguishable from WASM ones
|
|
\end{itemize}
|
|
|
|
\paragraph{currification}
|
|
The main difference revolves around functions. In OCaml, the functions
|
|
have only one argument. But most of the functions practically, takes
|
|
more than one. Without any special management this would mean that
|
|
most of the code would be functions producting closures that would be
|
|
immediately applied. To handle that, internally OCaml do have
|
|
functions taking multiple arguments, with some special handling for
|
|
cases where they are partially applied. This information is explicit
|
|
at the Flambda level. The transformation handle that would normally
|
|
occur in the native OCaml compiler in a further pass, called cmmgen.
|
|
Hence we have to duplicate this for in Wasocaml. Compiling this
|
|
requires some kind of structural subtyping on closures such that
|
|
closures for functions of arity one are supertypes of all the other
|
|
closures. Thanksfully there are easy encodings for that in WASM.
|
|
|
|
%% TODO example ml partial apply
|
|
|
|
\paragraph{stack representation}
|
|
|
|
From that point, the translation to WASM is quite
|
|
straightforward. Most of the remaining work revolves around
|
|
translating from let bound expressions to a stack based language
|
|
without producing too naive code. Also, we don't need to care too much
|
|
about low level WASM specific optimisations as we rely on Binaryen
|
|
(a quite efficient WASM to WASM optimizer) for those.
|
|
%% TODO trouver une ref à citer pour binaryen
|
|
|
|
\paragraph{unboxing}
|
|
|
|
The main optimisation available in native OCaml that we are missing is
|
|
number unboxing. As OCaml values have an uniform representation, only
|
|
small scalars can fit in a direct OCaml value. This means that the
|
|
types int64, nativeint, int32 and float have to be boxed. In numerical
|
|
code, lots of intermediate values of type float are created, and in
|
|
that case, the allocation time of the box of numbers would completely
|
|
dominate the actual computation time. To limit that problem there is
|
|
an optimisation called unboxing performed during the cmmgen pass that
|
|
tries to eliminate most of the obviously useless allocations. As this
|
|
pass is performed after flambda, and was not required to produce a
|
|
complete working compiler, this was left for future work. Note that
|
|
the end plan is to use the next version of flambda, which does a much
|
|
better job for unboxing.
|
|
|
|
\subsubsection{FFI}
|
|
|
|
We have plans to provide ways to interract with both standard C
|
|
libraries using the usual OCaml FFI and Javascript libraries using
|
|
js\_of\_ocaml FFI.
|
|
|
|
\paragraph{C}
|
|
|
|
Very recently, some extension to Clang where added that would allow
|
|
compiling C code with almost no change to the binding. We could
|
|
provide modified mlvalue.h headers files to use instead of the
|
|
standard ones replacing the usual macros with hand written WASM
|
|
functions. The only limitation that we forsee is that the Field macro
|
|
couldn't be an Lvalue anymore, a new Set\_field macro would be needed
|
|
instead (as was originaly proposed for OCaml 5)
|
|
|
|
\paragraph{js\_of\_ocaml}
|
|
|
|
To interract with Javascript code, we would need one more extension
|
|
(the string ref proposal).
|
|
|
|
Almost all external calls in jsoo goes through the
|
|
Js.Unsafe.meth\_call (of type 'a -> string -> any array -> 'b) function
|
|
which can be exposed to the WASM module as a function of type
|
|
|
|
%% (func (param $obj externref) (param $method stringref) (param $args $anyarray) (result externref))
|
|
|
|
This calls a method of name \$method on the object \$obj with the
|
|
arguments \$args . The javascript side is the one managing all the
|
|
dynamic typing.
|
|
|
|
\section{Expected performances}
|
|
|
|
We can't produce yet benchmarks on real sized programs. But
|
|
preliminary ones are quite convincing. While we of course don't expect
|
|
to be competitive with the native code compiler, the performance
|
|
degradation seems to be almost constant (around twice slower).
|
|
|
|
Compared to Javascript VM, a WASM compiler is a much simpler beast
|
|
that can compile ahead of time. There are none of the wild
|
|
impredictability that browser tend to demonstrate on Javascript. Also
|
|
various engines are expected to behave quite similarly.
|
|
|
|
\section{Future extensions}
|
|
|
|
Our compiler is base on OCaml 4.14. So effect handlers where out of
|
|
the scope. There are three strategies to handle them.
|
|
|
|
\paragraph{CPS compilation}
|
|
|
|
In an OCaml only program, with a whole program CPS transformation it
|
|
is possible to represent effect handlers as continuations. This
|
|
strategy has been proved to be effective by jsoo.
|
|
|
|
This is not ideal because this prevents the use of the target language
|
|
stack and instead stores it in the heap, with a non negligible
|
|
cost. It can be mitigated by only applying the transformation to the
|
|
code that can't be proved to not use effect handlers.
|
|
|
|
\paragraph{js promise}
|
|
|
|
\paragraph{WASM stack switching}
|
|
|
|
stack switching/effect handlers
|
|
|
|
\section{Related works other compilers}
|
|
|
|
\subsection{Dart}
|
|
\subsection{Scheme}
|
|
|
|
\subsection{OCaml}
|
|
js\_of\_ocaml / wasm\_of\_ocaml
|
|
|
|
\section{Current state}
|
|
|
|
Our current implementation only provides some part of the OCaml stdlib
|
|
externals as hand written WASM.
|
|
|
|
|
|
\section{conclusion}
|
|
|
|
discuttions i31, subtyping chain, GC proposal ça marche bien pour nous
|
|
|
|
\end{document}
|
|
|