This commit is contained in:
zapashcanon 2023-07-20 21:54:07 +02:00
parent 7328cf970f
commit c9bf6e696a
Signed by: zapashcanon
GPG Key ID: 8981C3C62D1D28F1
3 changed files with 110 additions and 54 deletions

View File

@ -1,3 +1,12 @@
@inproceedings{Lat08,
title={LLVM and Clang: Next generation compiler technology},
author={Lattner, Chris},
booktitle={The BSD conference},
volume={5},
pages={1--20},
year={2008}
}
@article{VB14,
title={From bytecode to JavaScript: the Js\_of\_ocaml compiler},
author={Vouillon, J{\'e}r{\^o}me and Balat, Vincent},

42
figure_type_hierarchy.mps Normal file
View File

@ -0,0 +1,42 @@
%!PS
%%BoundingBox: -291 -404 1056 -16
%%HiResBoundingBox: -290.62822 -403.93263 1055.963 -16.60393
%%Creator: MetaPost 2.02
%%CreationDate: 2023.04.26:1847
%%Pages: 1
%*Font: cmr10 82.71204 9.96265 31:a00000000000ac84f98
%%BeginProlog
%%EndProlog
%%Page: 1 1
0 0 0 setrgbcolor
-64.33165 -52.21632 moveto
(an) cmr10 82.71204 fshow
20.67801 -52.21632 moveto
(y) cmr10 82.71204 fshow
-40.20714 -218.26025 moveto
(eq) cmr10 82.71204 fshow
-290.62822 -403.93263 moveto
(i31) cmr10 82.71204 fshow
-126.82465 -399.56743 moveto
(struct) cmr10 82.71204 fshow
143.35522 -384.30457 moveto
(arra) cmr10 82.71204 fshow
288.56102 -384.30457 moveto
(y) cmr10 82.71204 fshow
0 4.15111 dtransform truncate idtransform setlinewidth pop [] 0 setdash
1 setlinecap 1 setlinejoin 10 setmiterlimit
newpath -56.81195 -238.02513 moveto
-237.78458 -332.08862 lineto stroke
newpath -7.14182 -250.94705 moveto
-20.79291 -332.08862 lineto stroke
newpath 56.81195 -238.02513 moveto
237.78458 -332.08862 lineto stroke
4.15111 0 dtransform exch truncate exch idtransform pop setlinewidth
newpath 0 -84.90288 moveto
0 -166.04431 lineto stroke
503.27621 -238.9883 moveto
(func) cmr10 82.71204 fshow
828.27472 -235.70572 moveto
(extern) cmr10 82.71204 fshow
showpage
%%EOF

View File

@ -142,6 +142,12 @@ subtyping relation between user types. Upcasts are implicit, downcasts
are explicit and incorrect ones lead to runtime errors.
It is possible to dynamically test for type compatibility.
\begin{tcolorbox}[colback=lightbg]
\begin{center}
\includegraphics[width=20em]{figure_type_hierarchy.mps}
\end{center}
\end{tcolorbox}
\paragraph{Contribution to the proposals.}
The general rule for the Wasm comittee is to only include features
@ -169,7 +175,7 @@ with JavaScript (that has exceptions).
Wasm1 does not allow tail-calls to be optimized. This is
a show-stopper when compiling functional languages. In the tail-call proposal, new instructions such as \mintinline{wast}{return_call} guarantee that a tail-call will be optimized.
\section{Value representation}
\section{Value Representation}
There have been attempts at targeting the early versions of Wasm from OCaml.
The first one simply compiles the OCaml runtime (including the GC)
@ -247,7 +253,7 @@ anyway and the least significant bit is always zero.
\end{center}
\end{tcolorbox}
\subsection{OCaml value representation in Wasm}
\subsection{OCaml Value Representation in Wasm}
As in native OCaml, we use a uniform representation. We cannot
use integers, as they cannot be used as pointers. We need to use a
@ -274,7 +280,7 @@ either use a struct with a field for the tag and one field per OCaml
value field. The other choice is to use arrays of \mintinline{wast}{eqref}
with the tag stored at position $0$.
\subsubsection{Block as struct}
\subsubsection{Blocks as Structs}
A block of size one is represented using the type \mintinline{wast}{$block1}:
@ -296,7 +302,7 @@ And for size two it is \mintinline{wast}{$block2}:
% TODO: add subtype annotation
% TODO: block is a subtype of block1 because In the OCaml IRs ...
In the OCaml IRs, the primitives for accessing a block field is untyped, thus when accessing the $n$-th field of the block
In the OCaml IRs, the primitive for accessing a block field is untyped, thus when accessing the $n$-th field of the block
the only information is that the block has size at least $n + 1$. Thus we need \mintinline{wast}{$block2} to be a subtype of \mintinline{wast}{$block1}.
It could be possible to propagate size metadata in the IRs but that wouldn't be sufficient because of untyped primitives such as \mintinline{ocaml}{Obj.field}
@ -309,7 +315,7 @@ that we need to support in order to be compatible with existing code.
\end{wast}
OCaml blocks can be arbitrarily large, and very large ones do appear
in practice. In particular modules are compiled as blocks and tend
in practice. In particular, modules are compiled as blocks and tend
to be on the larger side. We have seen in the wild some examples
containing thousands of fields.
%
@ -325,7 +331,7 @@ which is described next.
% TODO: trop long pour une solution qu'on ne retient pas ?
\subsubsection{Block as arrays}
\subsubsection{Blocks as Arrays}
We represent blocks with an array of eqref:
@ -335,7 +341,7 @@ We represent blocks with an array of eqref:
% TODO: clarify indice stuff
The tag is stored in the first cell (index 0) of the array. Accessing the field $1$ of the OCaml block amount to accessing the field $2$ of the array:
The tag is stored in the first cell (index 0) of the array. Thus accessing the field $1$ of the OCaml block amounts to accessing the field $2$ of the array:
\begin{wast}
(func $snd (param $x eqref) (result eqref)
@ -356,7 +362,7 @@ And reading the tag is implemented by reading the field $0$ and casting to an in
(ref.cast $block (local.get $x))))))
\end{wast}
\subsubsection{Block representation tradeoff}
\subsubsection{Block Representation Tradeoffs}
The array representation is simpler but requires (implicit) bound checks at each
field access and a cast to read the tag.
@ -366,13 +372,13 @@ Wasm runtime (a subtyping test). A compiler propagating more types
could use finer Wasm type information, providing a precise type to
each struct field. This would allow fewer casts.
Notice that, in practice, we could measure an upper bound on the actual
runtime cost of cast. The V8 runtime allows to consider cast as noop
(only for test purpose). The measured speedup is around 10\%.
The V8 runtime allows to consider casts as no-ops
(only for test purpose). The speedup we measured is around 10\%.
That gives us an upper bound on the actual runtime cost of casts.
% TODO: déplacer le fait qu'on abandonne la 1ère méthode à la fin de la discussion des tradeoff
\subsubsection{Boxed numbers}
\subsubsection{Boxed Numbers}
Raw Wasm scalars are not subtypes of \mintinline{wast}{eqref} thus they
cannot be used directly to represent OCaml boxed numbers. We need to box them inside a struct in order to make them compatible with
@ -403,43 +409,45 @@ variables. As an example, here is the type of a closure with two free variables:
\end{wast}
The actual representation is a bit more complex, to reduce casts and
handle mutually recursive functions. This is the only place where we
to handle mutually recursive functions. This is the only place where we
need recursive Wasm types.
\section{Compilation}
We use, as input for the Wasm generation, the flambda IR of the OCaml
compiler. This is a step of the compilation chain where most of the
We use the Flambda IR of the OCaml compiler as input for the Wasm generation.
This is a step of the compilation chain where most of the
high-level OCaml-specific optimisations are already applied. Also in
this IR, the closure conversion pass is already done. Most of the
this IR, the closure conversion pass is already performed. Most of the
constructions of this IR maps quite directly to Wasm ones:
\begin{itemize}
\item control flow and continuations have a direct equivalent with Wasm \mintinline{wast}{block}, \mintinline{wast}{loop}, \mintinline{wast}{br_table} and \mintinline{wast}{if} instructions;
\item low level OCaml primitives to handle exceptions are almost indistinguishable from Wasm ones.
\item control flow and continuations have a direct equivalent with Wasm \mintinline{wast}{block}, \mintinline{wast}{loop}, \mintinline{wast}{br_table}, and \mintinline{wast}{if} instructions;
\item low level OCaml primitives to handle exceptions are quite similar to Wasm ones.
\end{itemize}
% TODO: explain that in OCaml one can generate exceptions at runtime thus we can't use Wasm exn directly but have to use an identifier
\subsection{Currification}
The main difference revolves around functions. In OCaml, functions
have only one argument. However, in pratice, functions look like they have
more than one. Without any special management this would mean that
most of the code would be functions producing closures that would be
immediately applied. To handle that, internally, OCaml do have
immediately applied. To handle that, internally, OCaml does have
functions taking multiple arguments, with some special handling for
times when they are partially applied. This information is explicit
at the Flambda level. In the native OCaml compiler, the transformation handling that
occurs in a later step called cmmgen.
Hence we have to duplicate this in Wasocaml. Compiling this
requires some kind of structural subtyping on closures such that:
Hence, we have to duplicate this in Wasocaml. Compiling this
requires some kind of structural subtyping on closures such that,
closures for functions of arity one are supertypes of all the other
closures. Thankfully there are easy encodings for that in Wasm.
% TODO: explicit closures subtyping depending on the arity and the encoding in wasm
%% TODO example ml partial apply
\subsection{Stack representation}
\subsection{Stack Representation}
From that point, the translation to Wasm is quite
straightforward. Most of the remaining work revolves around
Most of the remaining work revolves around
translating from let bound expressions to a stack based language
without producing too naive code. Also, we do not need to care too much
about low level Wasm specific optimisations as we rely on Binaryen~\cite{Web15}
@ -453,38 +461,33 @@ small scalars can fit in a true OCaml value. This means that the
types \mintinline{ocaml}{nativeint}, \mintinline{ocaml}{int32},
\mintinline{ocaml}{int64} and \mintinline{ocaml}{float} have to be
boxed. In numerical code, lots of intermediate values of type float
are created, and in that case, the allocation time of the box of
are created, and in that case, the allocation time to box
numbers completely dominates the actual computation time.
To limit that problem, there is an optimisation called unboxing performed
during the cmmgen pass that tends to eliminate most of the obviously
useless allocations. As this pass is performed after flambda, and was
during the cmmgen pass that tends to eliminate most of the
useless allocations. As this pass is performed after Flambda, and was
not required to produce a complete working compiler, this was left
for future work. Note that the end plan is to use the next version of
flambda, which does a much better job for unboxing.
Flambda, which does a much better job at unboxing.
% TODO: does binaryen handle local unboxing ? e.g. (x +. y *. z)
\subsection{FFI}
We have plans to provide ways to interract with both standard C
libraries using the usual OCaml FFI and JavaScript libraries using
js\_of\_ocaml FFI.
A lot of OCaml code in the wild interacts with C or JavaScript code through bindings written using dedicated FFI mechanisms. We plan to allow re-use of existing bindings when compiling to Wasm.
\paragraph{C}
Very recently, some extensions to Clang were added that allow
compiling C code with almost no change to the binding. We could
provide modified mlvalue.h headers files to use instead of the
standard ones replacing the usual macros with hand written Wasm
functions. The only limitation that we forsee is that the Field macro
will not be an Lvalue anymore, a new Set\_field macro is needed
instead (as was originally proposed for OCaml 5)
Some extensions recently added to Clang~\cite{Lat08} allow to
compile C code to Wasm in a way that makes reusing existing
OCaml bindings to C code possible with almost no change.
We would only have to provide a modified version of the OCaml native FFI headers files, replacing the usual macros with hand written Wasm
functions. The only limitation that we forsee is that the \mintinline{c}{Field} macro will not be an l-value anymore; a new \mintinline{c}{Set_field} macro will be needed
instead (as it was originally proposed for OCaml 5).
\paragraph{Js\_of\_ocaml}
\paragraph{JavaScript}
To interact with JavaScript code, we need one more extension: the reference-typed strings proposal~\cite{Web22}.
Almost all external calls in jsoo goes through the
To re-use existing OCaml bindings to JavaScript code, we need one more extension: the reference-typed strings proposal~\cite{Web22}. Almost all external calls in the OCaml JavaScript FFI goes through the
\mintinline{ocaml}{Js.Unsafe.meth_call} function
(of type \mintinline{ocaml}{'a -> string -> any array -> 'b})
which can be exposed to the Wasm module as a function of type:
@ -498,10 +501,9 @@ which can be exposed to the Wasm module as a function of type:
\end{wast}
This calls a method of name \mintinline{wast}{$method} on the object \mintinline{wast}{$obj} with the
arguments \mintinline{wast}{$args}. The JavaScript side is the one managing all the
dynamic typing.
arguments \mintinline{wast}{$args}. The JavaScript side manages all the dynamic typing.
\subsection{Expected performances}
\subsection{Expected Performances}
We cannot yet produce benchmarks on real sized programs,
Wasocaml is still a prototype and is not yet integrated
@ -524,13 +526,16 @@ can sometimes be much slower in an unpredictable fashion.
% TODO: XL likes knuth-bendix, can we run it ? are we still under the 2x slower bound ?
Also various engines are expected to behave quite similarly.
% TODO: paragraphe trop petit
% TODO: expliquer...
\section{Effect handlers support}
\section{Effect Handlers Support}
% TODO: simplifier la structure, pas de subsection
Our compiler is based on OCaml 4.14. So effect handlers where out of
Our compiler is based on OCaml 4.14. So effect handlers were out of
the scope. There are three strategies to handle them.
\subsection{CPS compilation}
\subsection{CPS Compilation}
In a program containing only OCaml code, it is possible to represent
effect handlers as continuations, using whole-program CPS transformation.
@ -542,7 +547,7 @@ This is the choice made by js\_of\_ocaml but it requires whole program transform
% TODO: does it work with language interop ?
\subsection{Wasm stack switching}
\subsection{Wasm Stack Switching}
There is an ongoing proposal, called stack switching~\cite{Web21b}, that exactly
matches the needs for OCaml effect handlers. With it the translation
@ -552,10 +557,10 @@ would be quite straightforward.
Another strategy is available when targeting runtimes providing
both Wasm and JavaScript. There is another ongoing proposal called
JavaScript promise integration~\cite{Web21a}. In that solution effects handlers can be
implemented using a JavaScript device. There are signs showing that
this proposal might land before the proper stack switching one, it
might be a temporary solution.
JavaScript promise integration~\cite{Web21a}.
With this proposal, effects handlers can be implemented using a JavaScript device.
This proposal is likely to land before the proper stack switching one.
It might be a temporary solution.
\section{Related Work}
@ -623,7 +628,7 @@ Wasm\_of\_ocaml
%TODO:
% Unreleased, fork of jsoo targeting Wasm-GC
\section{Current state}
\section{Current State}
As the first version of the implementation was meant as a
demonstrator, it is a bit rough on the edges. In particular only a