From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <romain.bardou@inria.fr>
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by sympa.inria.fr (Postfix) with ESMTPS id D4A81820A1
	for <caml-list@sympa.inria.fr>; Fri,  6 Sep 2013 17:19:35 +0200 (CEST)
X-IronPort-AV: E=Sophos;i="4.90,855,1371074400"; 
   d="scan'208";a="31816823"
Received: from dhcp-rocq-250.inria.fr (HELO [128.93.62.250]) ([128.93.62.250])
  by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 06 Sep 2013 17:19:35 +0200
Message-ID: <5229F284.5050806@inria.fr>
Date: Fri, 06 Sep 2013 17:19:32 +0200
From: Romain Bardou <romain.bardou@inria.fr>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12
MIME-Version: 1.0
To: Markus Mottl <markus.mottl@gmail.com>
CC: "caml-list@inria.fr" <caml-list@inria.fr>
References: <5229DEF9.7040706@inria.fr> <CAP_800p=kanKKtEj6jvpYzeXm-hnpakAyCOO3s-sCtETE_f=mg@mail.gmail.com>
In-Reply-To: <CAP_800p=kanKKtEj6jvpYzeXm-hnpakAyCOO3s-sCtETE_f=mg@mail.gmail.com>
X-Enigmail-Version: 1.4.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Validation-by: romain.bardou@inria.fr
Subject: Re: [Caml-list] Accelerating compilation

Le 06/09/2013 16:55, Markus Mottl a écrit :
> On Fri, Sep 6, 2013 at 9:56 AM, Romain Bardou <romain.bardou@inria.fr> wrote:
>> 1) Separate typing and code generation, in ocamlc and in ocamlopt
>>
>> For instance, provide an option -typing-only which would mean "only
>> produce the .cmi but do not produce the .cmo or the .cmx". The compiler
>> would only need the .cmi of the dependencies, not their .cmo or .cmx.
>> This would make it possible to have a Makefile target, or an Ocamlbuild
>> option, to just type.
> 
> This seems like an interesting suggestion.  Code generation,
> especially for native code, can be quite expensive.  I do use byte
> code compilation during development for that reason, which is somewhat
> faster.

I considered doing that, but my project has some stubs which do not work
in bytecode for some reason (probably fixable). Also, it would mean that
I would have to compile both versions, as the program is too slow to be
used in bytecode, so the bytecode would only be used for quick
type-checking.

> Note, however, that there is one problem with the approach as
> suggested: if you have both an .mli and an .ml file, the build system
> would have to know when to "type" the latter.  In the suggested
> approach there would be no trace of this action, because the .cmi
> would come from the .mli file.  We would need to generate a separate
> dummy file in that case to visibly record the fact that the .ml file
> unifies with the .cmi file.  Or (see below) write out the typed
> abstract syntax tree of the .ml file, which would, of course, also
> have to unify with the .cmi file.

Indeed the build system would need to be tweaked. Another approach would
be to consider that .cmi files depend on .ml files as well, maybe only
if an option -just-type is passed. I'm not sure of the implications.

>> Also, provide an option -do-not-retype which would mean "if the .cmi
>> exists, load it instead of type-checking again". This would allow the
>> build process to first type-check (using -typing-only) and then generate
>> the code without type-checking again (using -do-not-retype). Of course
>> the build system should be very careful to ensure the .cmi is
>> up-to-date. This option could also help when compiling both in bytecode
>> and in native code. This option is not necessary to just find errors
>> quickly, though.
> 
> The .cmi file does not contain enough information, because it only
> contains the signature.  You need the typed AST for proper code
> generation, which might be quite big.  I haven't looked into this, but
> I wouldn't be surprised if the size of that thing could be so large
> that you might prefer type checking again over writing to + reloading
> it from a file.

Ah right I was thinking it contained the whole typed tree for some
reason, which is indeed not the case.

>> 2) Be able to disable Ocamlbuild's digest mechanism and use dates and
>> file sizes instead
>>
>> If I am not mistaken, this is one of the main reasons why Ocamlbuild is
>> slower that make. It does help to prevent useless recompilation, but
>> what good does it make to prevent a useless recompilation once in a
>> while if it is at the cost of losing a lot of time in all other cases?
>> I'm sure it is project-dependent though so it should only be an option,
>> say, -do-not-hash, or -comparison-mode dateandsize.
> 
> The problem here is that not all Unix-filesystems support sub-second
> resolution in timestamps.  So if you build a file and change a
> character in it within one second, the change may go unnoticed, i.e.
> required recompilation doesn't happen.  That's why hashing provides
> much stronger guarantees.  But I think this is not a big problem in
> practice so I'd support such a flag in ocamlbuild.
> 
>> 3) Parallel compilation in Ocamlbuild
>>
>> Of course it would help but it is not easy to implement so I'm just
>> putting it there to be exhaustive.
> 
> I'm not sure what you are referring to, OCamlBuild does already
> support parallel builds.

Does it? I actually thought the -j option was ignored.

I just did a quick test and I gain about 5 seconds with -j on a 1min15
build (I had cleaned, recompiled and recleaned before so that caching by
the file system would not impact the result too much), so it does seem
to be a *little* faster :)

Cheers,

-- 
Romain Bardou