* RE: [Caml-list] Big executables from ocamlopt; dynamic librariesagain @ 2002-03-18 10:50 Dave Berry 0 siblings, 0 replies; 2+ messages in thread From: Dave Berry @ 2002-03-18 10:50 UTC (permalink / raw) To: Jacques Garrigue, caml-list Do .cmi files use hash-consing? This can greatly reduce the size of type information. -----Original Message----- From: Jacques Garrigue [mailto:garrigue@kurims.kyoto-u.ac.jp] Sent: 18 March 2002 05:20 Subject: Re: [Caml-list] Big executables from ocamlopt; dynamic librariesagain ... If you've got a look at the size of some .cmi's, you may realize that including required types in executables may require potentially huge sizes. ... ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 2+ messages in thread
* [Caml-list] Big executables from ocamlopt; dynamic libraries again @ 2002-03-16 16:05 Tim Freeman 2002-03-18 1:12 ` Jacques Garrigue 0 siblings, 1 reply; 2+ messages in thread From: Tim Freeman @ 2002-03-16 16:05 UTC (permalink / raw) To: caml-list When I compile a simple "hello world" app using lablgtk, the resulting executable exceeds 800KB. With a few apps like this, my package will take offensively long to download, even over DSL. The exectuable seems to be including most of the lablgtk library in the executable, which makes sense because lablgtk is statically linked in the executable. This isn't a problem with lablgtk. A native code app that just prints "hello world" on standard output takes 5K bytes if written in C but 95K in ocaml. The GTK executable would be smaller if lablgtk were dynamically linked into it. I hear that a real shared library isn't an option because the present ocaml compiler can't generate position independent code. However, one can do dynamic linking on many machines without position independent code or a shared library; the dynamic linker just relocates the library at run time. I'd really rather write something in OCAML than the other languages available, but if the resulting executables are so huge that I can't distribute binaries, that's a problem. If I controlled the libraries myself, I could use the scaml patch from http://algol.prosalg.no/~malc/scaml/, but I'd much rather use the lablgtk that comes in Debian than package it myself, and I'd rather not stick my users with a redundant copy of the lablgtk library. Is there any reason there's no support for writing dynamically linkable libraries in OCAML? Hmm, if you memorized the MD5 checksum of the library at compile time, and checked it at run time, it could even be type safe. Or you could just memorize the MD5 of the signature of the library, in some sense; this would allow patches like the recent zlib double-free. If the library knows it's checksum, and the code loading it knows the expected checksum, then you can do this checking without computing a checksum at run time. -- Tim Freeman tim@fungible.com; formerly tim@infoscreen.com ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Caml-list] Big executables from ocamlopt; dynamic libraries again 2002-03-16 16:05 [Caml-list] Big executables from ocamlopt; dynamic libraries again Tim Freeman @ 2002-03-18 1:12 ` Jacques Garrigue 2002-03-18 1:29 ` Tim Freeman 0 siblings, 1 reply; 2+ messages in thread From: Jacques Garrigue @ 2002-03-18 1:12 UTC (permalink / raw) To: tim; +Cc: caml-list From: tim@fungible.com (Tim Freeman) > When I compile a simple "hello world" app using lablgtk, the resulting > executable exceeds 800KB. With a few apps like this, my package will > take offensively long to download, even over DSL. The exectuable > seems to be including most of the lablgtk library in the executable, > which makes sense because lablgtk is statically linked in the > executable. > > This isn't a problem with lablgtk. A native code app that just prints > "hello world" on standard output takes 5K bytes if written in C but > 95K in ocaml. This is indeed a problem (in my opininion small, but yet). But you're wrong in assuming that static linking means including the whole library. Actually this is quite the opposite: dynamic linking includes the whole library (but not in the executable), while static linking only includes needed parts. In particular dynamic linking without code sharing (using patched code) use more memory at runtime than static linking. You can see it as lablgtk.a is 1MB, while your app is only 800K (include the ocaml runtime, etc...) The unfortunate thing is that the structure of the LablGTK library creates lots of spurious dependencies. For instance, using a button links code for all kind of buttons, or using a label links code for the calendar widget... My decision was to privilege meaning (put related things together) over size (hack to be smaller). Yet, lablGTK stubs are tuned to produce small code on x86, but here your problem is with the size of the native code produced by ocamlopt, which is harder to optimize. > The GTK executable would be smaller if lablgtk were dynamically linked > into it. I hear that a real shared library isn't an option because > the present ocaml compiler can't generate position independent code. > However, one can do dynamic linking on many machines without position > independent code or a shared library; the dynamic linker just > relocates the library at run time. You're right, but there is a bit more about dynamic linking. A major problem with using dynamic linking with ocaml (in particular with native code), is that your program come cut into small pieces, and you must be sure that they are all compatible. Somebody posted recently about problems when upgrading ocaml, and part of it is caused by incompatibilities in the binary format between versions. Just imagine the reaction of your user when, after having loaded various packages required for your program, he only gets an error message or a segmentation fault when trying to run it. Static linking considerably improves that, since your program will no longer depend on ocaml being installed, and the versions of its different components. By the way, static linking here only concerns the ocaml specific part of the code, libgtk itself is dynamically linked, since one can expect to find a compatible implementation on each platform. The situation is a little bit better with bytecode: this time one only depends on the version of ocaml, no longer the platform. This is only in the CVS version (next release), but you can now use the ocaml toplevel as a dynamic loader: Compile your application as a .cmo or .cma ocamlc -a -o myapp.cma tools.cmo main.cmo (you don't need to include required libraries here) Load it with ocaml ocaml lablgtk.cma myapp.cma If speed is not a major problem, I think this can be nice in practice. > Is there any reason there's no support for writing dynamically > linkable libraries in OCAML? > > Hmm, if you memorized the MD5 checksum of the library at compile time, > and checked it at run time, it could even be type safe. Or you could > just memorize the MD5 of the signature of the library, in some sense; > this would allow patches like the recent zlib double-free. If the > library knows it's checksum, and the code loading it knows the > expected checksum, then you can do this checking without computing a > checksum at run time. One point is that, on one side, you really want the checking, and on the other side, with the current MD5 approach, the checking is too version dependent. That is, even if you've changed nothing in your code, and it would be perfectly safe to run it, you have good chances the loader will refuse to do it. And this nullifies the zlib example: the newly compiled version would not be compatible with existing executables! Of course, this could be improved, but this needs some research and/or engineering work. Cheers, Jacques Garrigue ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Caml-list] Big executables from ocamlopt; dynamic libraries again 2002-03-18 1:12 ` Jacques Garrigue @ 2002-03-18 1:29 ` Tim Freeman 2002-03-18 5:20 ` Jacques Garrigue 0 siblings, 1 reply; 2+ messages in thread From: Tim Freeman @ 2002-03-18 1:29 UTC (permalink / raw) To: garrigue; +Cc: caml-list >Actually this is quite the opposite: dynamic linking >includes the whole library (but not in the executable), while static >linking only includes needed parts. In particular dynamic linking >without code sharing (using patched code) use more memory at runtime >than static linking. I agree. Even with the tradeoff you describe, I prefer the dynamic linking. I'm not too worried about memory at runtime; another 5 or 10MB will go unnoticed. If the code isn't used and there's a memory crisis, the code will get paged out. But the speed of the modems is limited and the space on the disk bites you even when you aren't running the program, so the size of the download matters more in my opinion. >The unfortunate thing is that the structure of the LablGTK library >creates lots of spurious dependencies. I see the same sort of thing happening in /usr/lib/ocaml/stdlib.a, so lablgtk is not alone there. If you make an object, you load oo.o from stdlib.a, which defines an unrelated function that uses random numbers, so static linking then grabs random.o. >A major problem with using dynamic linking with ocaml (in particular >with native code), is that your program come cut into small pieces, >and you must be sure that they are all compatible. How is ocaml different from C in this regard? One difference is that ocaml is younger and therefore changing faster, but eventually that won't be true any more. Are there other difference? >Static linking considerably improves that, since your program will no >longer depend on ocaml being installed, and the versions of its >different components. The C people find it prudent to offer both options. What is different about the ocaml situation? >..with the current MD5 approach, the checking is too version >dependent... Can someone point me to a description of the current MD5 approach? >And this nullifies the zlib example: the newly compiled version would >not be compatible with existing executables! Is there something wrong with just checksumming the signature to decide whether the code is compatible? That would still be more sensitive than I'd like, since adding to the signature ideally would not require people using the package to recompile, but it ought to support the zlib example. -- Tim Freeman tim@fungible.com; formerly tim@infoscreen.com ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Caml-list] Big executables from ocamlopt; dynamic libraries again 2002-03-18 1:29 ` Tim Freeman @ 2002-03-18 5:20 ` Jacques Garrigue 2002-03-18 10:10 ` [Caml-list] Big executables from ocamlopt; dynamic librariesagain Warp 0 siblings, 1 reply; 2+ messages in thread From: Jacques Garrigue @ 2002-03-18 5:20 UTC (permalink / raw) To: tim; +Cc: caml-list From: tim@fungible.com (Tim Freeman) > >A major problem with using dynamic linking with ocaml (in particular > >with native code), is that your program come cut into small pieces, > >and you must be sure that they are all compatible. > > How is ocaml different from C in this regard? One difference is that > ocaml is younger and therefore changing faster, but eventually that > won't be true any more. Are there other difference? In short: C doesn't make sure that they are compatible. If they are, this will work, otherwise, undefined behaviour. Programmers and users are responsible for checking (by hand!) that the API didn't change in an incompatible way. If you want to have both security and allow linking everytime it's safe, then you would need to do lots of type-checking at link-time (runtime for dynamic linking). Basically that every module you depend on has an interface at least as good as what you need, checking type by type. If you've got a look at the size of some .cmi's, you may realize that including required types in executables may require potentially huge sizes. And type-checking is sometimes too slow. As a fall-back solution, there is MD5 hashing. The problem is that you're then mixing information for all the contents of a module. Any change will produce a new incompatible hash value. For instance, every time you add a function to a library, it becomes incompatible. And there are new functions in every release of ocaml. Note that for C, compatibility policies generally allow adding extra functions to a library without changing the version number, since the problem, should it arise, can be detected at link time. And, even worse than that, the current MD5 computation scheme is algorithm dependent: it is not based on a normalized view of types, but just on a dump of an internal tree structure, which is extremely sensitive to any change in the type checking algorithm. This means that compatibility can be broken as often as once a week for the CVS version! I suppose one could define specific normalizing picking and unpicking procedures, rather than using output_value and input_value as currently, but this would be a fair amount of work, and I'm not even sure it would solve completely the problem. > >And this nullifies the zlib example: the newly compiled version would > >not be compatible with existing executables! > > Is there something wrong with just checksumming the signature to > decide whether the code is compatible? That would still be more > sensitive than I'd like, since adding to the signature ideally would > not require people using the package to recompile, but it ought to > support the zlib example. The main proablem being with incompatibilities between different version of ocaml, any code compiled with ocaml cannot be given compatible MD5 signatures... So, yes, your zlib example would work, but only for the 6 months between two releases of ocaml. This can be OK if you have a fair control of what you are running, but this would be nightmare for the average user. So probably the real answer is that dynamic linking of caml native code is possible, but that it would be a lot of work, not so much at the compilation level, but more at improving compatibility checking. One could argue that the benefits would not be limited to dynamic linking alone, but also easier upgrading bewteen ocaml versions, so this might be worth it. Cheers, Jacques Garrigue ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Caml-list] Big executables from ocamlopt; dynamic librariesagain 2002-03-18 5:20 ` Jacques Garrigue @ 2002-03-18 10:10 ` Warp 0 siblings, 0 replies; 2+ messages in thread From: Warp @ 2002-03-18 10:10 UTC (permalink / raw) Cc: caml-list > If you want to have both security and allow linking everytime it's > safe, then you would need to do lots of type-checking at link-time > (runtime for dynamic linking). Basically that every module you depend > on has an interface at least as good as what you need, checking type > by type. If you've got a look at the size of some .cmi's, you may > realize that including required types in executables may require > potentially huge sizes. And type-checking is sometimes too slow. > > As a fall-back solution, there is MD5 hashing. The problem is that > you're then mixing information for all the contents of a module. > Any change will produce a new incompatible hash value. > For instance, every time you add a function to a library, it becomes > incompatible. And there are new functions in every release of ocaml. Is the speed really an issue ? I mean... the ocaml compiler is doing that job, and even more, right ? and its speed looks quite good. Or perhaps this is an inv-NP problem where checking against a given signature takes exp. time when producing a valid one is easy :) Of course, that's true the size of CMI is turning big quite fast. But including the CMI will be better then including the CMA ( if there is no C dll behind, in that case, the CMI is bigger than the CMA ) > Note that for C, compatibility policies generally allow adding extra > functions to a library without changing the version number, since the > problem, should it arise, can be detected at link time. > > And, even worse than that, the current MD5 computation scheme is > algorithm dependent: it is not based on a normalized view of types, > but just on a dump of an internal tree structure, which is extremely > sensitive to any change in the type checking algorithm. This means > that compatibility can be broken as often as once a week for the CVS > version! That MD5 choice is of course well justified , but if that mean breaking backward compatibily , distribuing ocaml precompiled binaries become impossible, and then you're closing the ocaml door to many commercial usages of ocaml, which could greatly improve the size of the community and so the speed of ocaml development. > So probably the real answer is that dynamic linking of caml native > code is possible, but that it would be a lot of work, not so much at > the compilation level, but more at improving compatibility checking. > One could argue that the benefits would not be limited to dynamic > linking alone, but also easier upgrading bewteen ocaml versions, so > this might be worth it. Do you mean dynlink of native code from bytecode ? Whithout a CMI to ensure the type checking ? Nicolas Cannasse ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-03-18 10:52 UTC | newest] Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-03-18 10:50 [Caml-list] Big executables from ocamlopt; dynamic librariesagain Dave Berry -- strict thread matches above, loose matches on Subject: below -- 2002-03-16 16:05 [Caml-list] Big executables from ocamlopt; dynamic libraries again Tim Freeman 2002-03-18 1:12 ` Jacques Garrigue 2002-03-18 1:29 ` Tim Freeman 2002-03-18 5:20 ` Jacques Garrigue 2002-03-18 10:10 ` [Caml-list] Big executables from ocamlopt; dynamic librariesagain Warp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox