\paragraph{} A function is said to be \emph{polymorphic} when a single implementation of this function can be used with several different types. A polymorphic function may accept types that need to be treated differently at runtime. It could be because they have different memory representation, use different calling conventions or need to be discriminated by the garbage collector. It is thus necessary to keep track of these informations at runtime in order to interpret or to compile a polymorphic function.
\paragraph{} A function is said to be \emph{polymorphic} when a single implementation of this function can be used with several different types. A polymorphic function may accept types that need to be treated differently at runtime. It could be because they have different memory representation, use different calling conventions or need to be discriminated by the garbage collector. It is thus necessary to keep track of these information at runtime in order to interpret or to compile a polymorphic function.
% TODO: exemples
\paragraph{} Many polymorphism implementation techniques exist but only some of them have been described in research papers. We describe these techniques extensively, list their advantages and limitations, and compare all of them.
\section{By hand: meta-programming}
In this section, we'll discuss what we are \emph{not} talking about. That is to say, techniques used when a language do not properly supports polymorphism.
In this section, we discuss techniques used when a language does not properly support polymorphism.
\paragraph{Source code duplication by hand}When a programming language do not offer any facility to write polymorphic code, it is always possible to duplicate code by hand to handle different types.
\paragraph{Source code duplication by hand}Some programming languages do not provide means to write polymorphic code. In this cases it is still possible to duplicate code by hand to handle different types.
\paragraph{Source code generation} It is possible to avoid doing this duplication by hand for each different type. This is done by writing a first program that will generate source code for a given type.
\paragraph{Source code generation} It is possible to avoid having to do this duplication by hand. This is done by writing a program that will generate the duplicated source code for each type.
\paragraph{Source code transformation} It is possible to generate source code using a preprocessor, this is described in~\cite{SY74} under the name \emph{syntax-macro extension}. Its is used~\cite{EBN02} in the C programming language through it's preprocessor~\cite{SW87}. The technique is described in~\cite{Reb17}. For instance, given the \commandbox{list.c} file:
\paragraph{Source code transformation} It is possible to generate source code using a preprocessor. This is described in~\cite{SY74} under the name \emph{syntax-macro extension}. It is used~\cite{EBN02} in the C programming language through its preprocessor~\cite{SW87}. The technique is described in~\cite{Reb17}. For instance, given the \commandbox{list.c} file:
Given a polymorphic function, it is (sometimes) possible to \emph{statically} collect all the types combinations it is going to be used with. The \emph{monomorphization} technique consists in producing a different \emph{specialised} function for each combination. This results in having only \emph{monomorphic} functions, hence the name monomorphization.
Given a polymorphic function, it is (sometimes) possible to \emph{statically} collect all the type combinations it is going to be used with. The \emph{monomorphization} technique consists in producing a different \emph{specialised} function for each combination. This results in having only \emph{monomorphic} functions, hence the name monomorphization.
To build the set of types combinations for a given function, we iterate on the program's \emph{call graph}. At each \emph{call site}, the combination from this call is added to the set.
To build the set of type combinations for a given function, we iterate on the program's \emph{call graph}. At each \emph{call site}, the type combination of the current call is added to the set of the function being called.
Once the set is computed, the original polymorphic function is removed. All the monomorphic functions are generated and added to the program. Finally, each call site is updated to use the right function.
Monomorphization is used by Rust's generics~\cite{Con18}, C++'s templates~\cite{Str88}, Ada's generic packages and subprogram~\cite{ND79}\cite{Bar80} and Coq's extraction~\cite{TAG18}.
Monomorphization is used by Rust's generics~\cite{Con18}, C++'s templates~\cite{Str88}, Ada's generic packages and subprograms~\cite{ND79}\cite{Bar80} and Coq's extraction~\cite{TAG18}.
% Why3~\cite{BP11}
Monomorphization may seem similar to the various techniques described in the previous section. The difference lies in the fact that monomorphization is dedicated to handle polymorphism, whereas metaprogramming only allows polymorphism incidentally. Even if C has macros, no one would say that C is a polymorphic language. In the same vein, even if C++, D and Rust respectively have macros, String mixins and procedural macros; they also have a templates/generics system dedicated to polymorphism. The term monomorphization should be used to talk about the \emph{prefered form} of polymorphism.
Monomorphization may seem similar to the various techniques described in the previous section. The difference lies in the fact that monomorphization is dedicated to handle polymorphism, whereas metaprogramming only allows polymorphism incidentally. Even though C has macros, C is not considered to be a polymorphic language. In the same vein, even if C++, D and Rust respectively have macros, String mixins and procedural macros; they also have a templates/generics system dedicated to polymorphism. The term monomorphization should be used to talk about techniques designed to handle polymorphism.
\subsection{Advantages}
\paragraph{Code specialisation} The code produced by monomorphization is usually very efficient. Indeed, as the types are known precisely, it is possible to generate machine code fully specialised for this type. This includes the usage of dedicated assembly instructions or calling conventions. Moreover, there's no need for any kind of runtime support as there's no need to act differently depending on the type.
\paragraph{Code specialisation} The code produced by monomorphization is usually very efficient. Indeed, as the types are known precisely, it is possible to generate machine code fully specialised for a given type. This includes the usage of dedicated assembly instructions or calling conventions. Moreover, there is no need for any kind of runtime support as there is no need to act differently depending on the type.
\paragraph{Memory usage} Heap memory usage is optimal as we only store the values we need and no runtime metadata.
@ -70,13 +70,13 @@ Monomorphization may seem similar to the various techniques described in the pre
\subsection{Disadvantages}
\paragraph{Compilation cost}The compilation cost is usually quite high as the number of specialised functions per polymorphic function can be high. This leads to an increase in compilation time and memory use.
\paragraph{Compilation cost}Compilation time and memory usage grow as the number of specialised functions per polymorphic function increases. Which can become quite costly.
\paragraph{Binary size} For the same reason, the compiled binary can be quite huge. Each source polymorphic function potentially leading to a lot of assembly monomorphic functions.
\paragraph{Dynamic languages} Monomorphization doesn't work for dynamic languages. TODO: explain a little bit more, with an example.
\paragraph{Modularity} Monomorphization is not modular as it requires either access to the full source code, either to keep a representation of polymorphic code until linking in order to support separate compilation. TODO: explain more, with an example
\paragraph{Modularity} Monomorphization is not modular as it requires either access to the full source code, or to keep a representation of polymorphic code until linking in order to support separate compilation. TODO: explain more, with an example
\subsubsection{Polymorphic recursion}
@ -90,36 +90,36 @@ For instance, this implementation of \emph{Revisited Binary Random-Access Lists}
\input{ral.rs.tex}
\end{tcolorbox}
The Rust compiler loops indefinitely on this file as it tries to generate an infinite number of specialised versions of the \texttt{len} function. Note that without the \texttt{main} function, the compiler succeeds. That's because monomorphization can only happen at linking. Without a main function, the \texttt{len} function can't be specialised as its call sites are still partly unknown.
The Rust compiler loops indefinitely on this file as it tries to generate an infinite number of specialised versions of the \texttt{len} function. Note that without the \texttt{main} function, the compiler succeeds. That's because monomorphization can only happen at linking. Without a main function, the \texttt{len} function can not be specialised as its call sites are still partly unknown.
TODO: cutoff C++
It may be possible to statically detect that a function is not monomorphizable. To the best of our knowledge this is an open problem.
TODO: si typage dans HM -> monomorphisation possible ?
TODO: si typable dans HM -> monomorphisation possible ?
TODO: demander à JHJ s'il a réoslu le pb avec la réduction vers le système de réécriture
\subsection{Optimisations}
\paragraph{Avoid useless specialisation} When a type parameter is not used by a function, it not necessary to specialise the function for this parameter. This optimisation is performed by the Rust compiler. It's not very common for a type parameter not to be used (i.e. not to appear in arguments or result).
\paragraph{Avoid useless specialisation} When a type parameter is not used by a function, it is not necessary to specialise the function for this parameter. This optimisation is performed by the Rust compiler-- even though it is not very common for a type parameter to be unused (i.e. not to appear in arguments or result).
\paragraph{Polymorphization}When functions have closures, closures inherit type parameters from the function they belong to. In this case, it is much more common that they end up unused. The optimisation that prevent closures to be specialised for unused type parameters is called \emph{polymorphization}. The initial implementation for the Rust compiler is described in~\cite{Wo20}.
\paragraph{Polymorphization}Functions may have closures. Closures inherit the type parameters of the function they belong to. In this case, it is much more common that type parameters end up unused. The optimisation that prevents closures to be specialised for unused type parameters is called \emph{polymorphization}. The initial implementation for the Rust compiler is described in~\cite{Wo20}.
\section{Boxing everything}
The \emph{boxing} technique uses an \emph{uniform} representation of values: pointers. \emph{Scalar} values (integers, booleans, characters\ldots) are stored in a heap-allocated \emph{block} (sometimes called a \emph{box}) and represented by a pointer to this block. Blocks usually contain metadata describing their size and what kind of data they're made of. Values that already were pointers (arrays, lists\ldots) are still represented by a pointer, but instead of pointing directly to their data, they're pointing to a block.
The \emph{boxing} technique uses an \emph{uniform} representation of values: pointers. \emph{Scalar} values (integers, booleans, characters\ldots) are stored in a heap-allocated \emph{block} (sometimes called a \emph{box}) and represented by a pointer to this block. Blocks usually contain metadata describing their size and what kind of data they are made of. Values that were pointers before (arrays, lists\ldots) are still represented by a pointer, but instead of pointing directly to their data, they are pointing to a block.
All values being pointers, polymorphic function can deal with any type parameter in an unique way. At runtime, it'll be necessary to perform some operations to box and unbox values when needed. When it is required to discriminate between pointers and scalar (e.g. by the garbage collector), the information is found in the block's metadata. For instance:
All values being pointers, polymorphic functions can deal with any type parameter in an unique way. At runtime, it will be necessary to perform some operations to box and unbox values when needed. When it is required to discriminate between pointers and scalar (e.g. by the garbage collector), the information is found in the metadata of the block. For instance:
Notice that \texttt{x} and \texttt{array2[0]} are pointing to the same block. It's because Python pre-allocates small integers for optimisations reasons.
Notice that \texttt{x} and \texttt{array2[0]} are pointing to the same block. It is because Python pre-allocates small integers for optimisations reasons.
One may wonder why integers arrays are not represented as a block of scalars instead of being a block of pointers to a single integer, which would give the following memory layout:
@ -139,21 +139,21 @@ One may wonder why integers arrays are not represented as a block of scalars ins
It's because in Python, integers are of arbitrary size. An integer may use an unbounded amount of space. It's thus impossible to have a single block of raw integers as they may all have a different size. That's why each integers is in its own block, with its size being stored in the block's metadata.
It's because in Python, integers are of arbitrary size. An integer may use an unbounded amount of space. It is thus impossible to have a single block of raw integers as they may all have a different size. That is why each integer is in its own block, with its size stored in the metadata of the block.
\subsection{Advantages}
\paragraph{Ease of implementation} Boxing is one of the easiest technique to implement, it only requires to insert some code that will box, unbox and read blocks metadata at runtime.
\paragraph{Ease of implementation} Boxing is one of the easiest techniques to implement, it only requires to insert some code that will box, unbox and read the metadata of the blocks at runtime.
\paragraph{Compilation cost} The compilation cost of boxing is very low. Indeed, each polymorphic function is compiled only once.
\paragraph{Binary size} The binary size of code produced by boxing is also very low, as each polymorphic function is compiled into a single assembly blob.
\paragraph{Interpreted languages} Boxing is compatible with compiled and interpreted languages
\paragraph{Interpreted languages} Boxing is compatible with compiled and interpreted languages.
\paragraph{Modularity} Boxing is modular. Indeed, each function can be compiled without knowing all its call site. Therefore, separate compilation is possible with boxing without any difficulty.
\paragraph{Modularity} Boxing is modular. Indeed, each function can be compiled without knowing all its call sites. Boxing is thus compatible with separate compilation.
\paragraph{Polymorphic recursion} Boxing is compatible with polymorphic recursion. If a function calls itself with an unbound number of types, they'll all have te same representation and can therefore be handled by the same assembly code.
\paragraph{Polymorphic recursion} Boxing is compatible with polymorphic recursion. If a function calls itself with an unbounded number of types, they will all have the same representation and can therefore be handled by the same assembly code.
\subsection{Disadvantages}
@ -165,7 +165,7 @@ It's because in Python, integers are of arbitrary size. An integer may use an un
TODO: link ; TODO: explain difference with python (Java has some unboxed scalar oustide of generics)
It is used in Java~\cite{Bra+98}. The difference with Python being that only values that are used inside generics need to be boxed. Java thus have\texttt{int} values that are unboxed 32 bits integers and \texttt{Integer} values that are boxed 32 bits values. Only \texttt{Integer} can be used inside generics such as \texttt{ArrayList}. Java also have some arrays that can only contain unboxed values such as \texttt{int}.
It is used in Java~\cite{Bra+98}. The difference with Python being that only values that are used inside generics need to be boxed. Java thus has\texttt{int} values that are unboxed 32 bits integers and \texttt{Integer} values that are boxed 32 bits values. Only \texttt{Integer} can be used inside generics such as \texttt{ArrayList}. Java also has some arrays that can only contain unboxed values such as \texttt{int}.
\begin{tcolorbox}[breakable]
\input{layout.java.tex}
@ -181,11 +181,11 @@ It is used in Java~\cite{Bra+98}. The difference with Python being that only val
\subsection{Advantages}
This techniques has the same advantages than boxing everything. Moreover, in some cases it allows to avoid boxing, thus beiung more efficient.
This technique has the same advantages than boxing everything. Moreover, in some cases it allows to avoid boxing, thus being more efficient.
\subsection{Disadvantages}
This technique has the same disadvantages than boxing everything. The implementation is a little bit more involved but stays straightforward.
This technique has the same disadvantages than boxing everything. The implementation requires a little bit more involvement but stays straightforward.
\section{Pointer-tagging}
@ -207,7 +207,7 @@ If $b_{1} = 1$, then the $n - 1$ most significant bits are a small scalar and $b
In the second case, we talk about \emph{small scalars} instead of scalars because we can only represents $2^{n -1}$ values instead of the $2^n$ that are representable when all bits are available. For pointers, we do not loose anything as they need to be \emph{aligned} anyway and the last bit is always unused.
In the second case, we talk about \emph{small scalars} instead of scalars because we can only represents $2^{n -1}$ values instead of the $2^n$ that are representable when all bits are available. For pointers, we do not lose anything as they need to be \emph{aligned} anyway and the last bit is always zero.
This technique is used by SML/NJ and OCaml:
@ -224,11 +224,11 @@ This technique is used by SML/NJ and OCaml:
\subsection{Disadvantages}
\paragraph{No full scalars}
This allows to unbox $2^{n -1}$ scalars but not the $2^n$ full scalar range. This is why \commandbox{int} in OCaml are $63$ bits (on $64$ bits platforms). In practice this is not a problem for two reasons: it is still possible to have \emph{boxed} values covering the full range, in OCaml this is done through the \commandbox{Int64} module on 64 bits platforms. Moreover, when the programmer needs values that do not fit in small scalars, it's likely that they'll not fit in full scalars either and that a boxed representation for big values is going to be needed anyway, e.g. Zarith in OCaml.
This allows to unbox $2^{n -1}$ scalars but not the $2^n$ full scalar range. This is why \commandbox{int} in OCaml are $63$ bits (on $64$ bits platforms). In practice this is not a problem for two reasons: it is still possible to have \emph{boxed} values covering the full range, in OCaml this is done through the \commandbox{Int64} module on 64 bits platforms. Moreover, when the programmer needs values that do not fit in small scalars, it is likely that they will not fit in full scalars either and that a boxed representation for big values is going to be needed anyway, e.g. Zarith in OCaml.
\paragraph{Boxed floats}
There's no standard for 63 bits \commandbox{float} and processors don't have instructions to operate on them. Due to the way they're represented, it's not possible to simply truncate and shift one bit as it's done with integers. Thus, float need to be represented as 64 bits boxed values.
There is no standard for 63 bits \commandbox{float} and processors don't have instructions to operate on them. Due to the way they are represented, it is not possible to simply truncate and shift one bit as it is done with integers. Thus, floats need to be represented as 64 bits boxed values.
\subsection{Optimisations}
@ -240,7 +240,7 @@ There's no standard for 63 bits \commandbox{float} and processors don't have ins
TODO: say that it's not really how it is: even with a single float, the tag needs to say it's a floatotherwise we'll interpret it as a tagged pointer...
But it's possible to use a special tag in the array's block metadatas indicating that the data in the block are unboxed floats. Thus, we have only one box for the array, but each float is unboxed:
But it is possible to use a special tag in the metadatas of the block of the array indicating that the data in the block are unboxed floats. Thus, we have only one box for the array, but each float is unboxed:
@ -287,7 +287,7 @@ Runtime monomorphization supports polymorphic recursion. Even if a function may
\input{ral.fsx.tex}
\end{tcolorbox}
When running the above code with \commandbox{fsharpc ral.fsx && chmod +x ral.exe && ./ral.exe}, the JIT is observable. Each time the length of the list is doubling, its type is getting bigger. This triggers the JIT and pause the execution. This is noticeable only on the first call to \commandbox{loop Nil}. On the second call, the functions have already been monomorphized and the execution is fast and no pause is happening.
When running the above code with \commandbox{fsharpc ral.fsx && chmod +x ral.exe && ./ral.exe}, the JIT is noticeable. Each time the length of the list doubles, the type of its leaves grows. This triggers the JIT and pauses the execution. This is noticeable only on the first call to \commandbox{loop Nil}. On the second call, the functions have already been monomorphized and the execution is fast and no pause happens.