Re: [Caml-list] Separate compilation

Mailing list for all users of the OCaml language and system.
 help / color / mirror / Atom feed

From: "Edwin Török" <edwin+ml@etorok.net>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Separate compilation
Date: Mon, 07 Apr 2025 22:55:33 +0100	[thread overview]
Message-ID: <fbbfa7031a1e68afdc166c227355aef8440c7dd4.camel@etorok.net> (raw)
In-Reply-To: <Z/CxDairaiIpSICa@Magus.localnet>

On Sat, 2025-04-05 at 13:26 +0900, Oleg wrote:
> 
> In designing separate compilation, OCaml has decided to restrict the
> correspondence between separately compiled modules and their
> interfaces to one-to-one. The restriction is indeed limiting: making
> the linking with alternative/improved implementations, and the
> implementation extension/evolution ungainly (in fact, seemingly
> impossible).
> 
> To be concrete: suppose we have an interface A.mli and two
> implementations 
> of it: A1.ml and A2.ml. We also have the user code B.ml:
>         open A
>         ... using the operations of A
> We want to compile B.ml and link that B.cmo with either
> implementation. As posed, the problem seem unsolvable: we need either
> modify the source B.ml to refer to a particular implementation, or
> turn
> B.ml into a functor (which could be inconvenient, and also
> suboptimal:
> calls to A operations become indirect). 

Thanks for exploring solutions to this issue.

FWIW this is what Mirage does, but the functors spread across the
codebase and can become quite big, e.g. a tutorial HTTP application has
a functor that needs 5 modules:
https://github.com/mirage/mirage-skeleton/blob/main/applications/http/unikernel.ml#L32

This is the code to compose it all (IIUC it runs at build time):
https://github.com/mirage/mirage-skeleton/blob/main/applications/http/config.ml

An alternative is Dune with virtual libraries:
https://dune.readthedocs.io/en/stable/virtual-libraries.html
This doesn't require functors, but it prevents cross-module inlining
from working at all
(indeed the actual implementation may only be known at the time the
final application is linked). 
However the module chosen *is* still known at application build time,
so it is a bit unfortunate to have to chose between functors (where
perhaps Flambda2 can help remove the overhead), or the loss of inlining
(not sure if Flambda2 can help with this one due -opaque).

Nowadays even GCC does LTO by default at application link time on some
systems (which in certain cases also increases OCaml link times
considerably as the entire OCaml C runtime is processed by LTO).
We already have separate development and release build profiles (in
Dune) that make different tradeoffs between compilation speed and
application runtime speed.
So doing more work at link time (even recompiling modules) may not
necessarily be a bad idea (as long as that work is parallelizable, with
itself or the C LTO).

> 
> For the next problem, assume the interface U.mli and its
> implementation U.ml. We want to extend them -- say, add a new
> operation to the interface and the implementation -- without
> modifying
> U.mli/U.ml and without cut-and-paste. That is, we want merely
> `include'
> U and add the new operation -- making a `diff' so to speak. If U.mli
> abstracts away the implementation details, and if they are needed to
> implement the new operation, the problem seem unsolvable.
> 
> It turns out there are work-arounds for both problems, which solve
> them, albeit not elegantly. The full explanation is a bit too long
> for
> this message; please see
>         https://okmij.org/ftp/ML/module-extensibility.html

Would be interesting to also explore how this affects 'ocamlopt', where
the various choices also result in working inlining or not (and
potentially runtime performance impact).
The use of -opaque might be needed in some cases to avoid tying users
of a module to a particular implementation too early (through cross-
module inlining, which then effectively makes it incompatible with
other implementations, should it be switched out later).

*If* we want both separate compilation, and good runtime performance
(cross-module inlining) I think we'd have to give up part of separate
compilation, and move code generation to link time, at least for
functions that (transitively) cross such a multi-implementation
interface boundary.
(we can't really recompile all users for each implementation as that
might lead to a combinatorial explosion)

For some situations there is another alternative: build contexts.

If you want to consistently choose implementation X for the interface
A, then all libraries/applications built in that build context could be
configured to do the necessary symlinking/renaming to treat X as the
implementation of A with cross-module inlining intact.
(This is effectively what Mirage has to do for cross-compilation
anyway, so then why does it still need functors...)
Configurations that don't want to use a complicated build setup (or for
quick edit-compile cycles) could fall back to using -opaque and
defering the choice to link time, and losing inlining.

This only works if X itself doesn't depend on more such multi-
implementation interfaces (or if it does the overall number of
configurations is small, i.e. not the full cartesian product). 
At the time of building/installing library Y you could then precompile
it for all the N build contexts (perhaps dependency caching/hashing
should be able to tell you where you can reuse build artifacts from
other contexts).

I think the use of build contexts would actually be compatible with
your proposal. All that would change is build order and whether -opaque
is used or not:
* without build contexts you'd build the opaque interface first, then
its users, then its implementation, then link final application
* with build contexts you'd build the interface (without -opaque!), one
of its implementations, then all users of that implementation, then
link the final application (and changing the implementation would
require recompiling all its users transitively, but that'd be an
acceptable situation for better performance)

> 
> The work-arounds are simple. Should OCaml developers be so inclined,
> they
> may be incorporated in OCaml.
> 

I'd be interested to hear the opinion of Mirage unikernel users, as
they're most likely affected by the current situation, and benefit from
this improvement (I'm not a heavy Mirage unikernel user myself -- yet).

Best regards,
--Edwin

next prev parent reply	other threads:[~2025-04-07 21:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-05  4:26 Oleg
2025-04-07 21:55 ` Edwin Török [this message]
  -- strict thread matches above, loose matches on Subject: below --
2011-09-08 14:26 [Caml-list] separate compilation Walter Cazzola
2011-09-08 14:33 ` Philippe Wang
2011-09-08 14:40   ` Walter Cazzola
2011-09-08 14:33 ` Esther Baruk
2011-09-08 14:42   ` Walter Cazzola
2011-09-08 15:55     ` AUGER Cedric
2011-09-09  6:50       ` Walter Cazzola
2011-09-09  7:06         ` David Allsopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbbfa7031a1e68afdc166c227355aef8440c7dd4.camel@etorok.net \
    --to=edwin+ml@etorok.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox