From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id F1F3DBC37 for ; Wed, 27 Jan 2010 16:26:23 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AukBAJ7pX0vZSMDji2dsb2JhbACBNJogAQEBCgsKBxEFvDWEOQQ X-IronPort-AV: E=Sophos;i="4.49,354,1262559600"; d="scan'208";a="54789993" Received: from fmmailgate02.web.de ([217.72.192.227]) by mail4-smtp-sop.national.inria.fr with ESMTP; 27 Jan 2010 16:26:23 +0100 Received: from smtp06.web.de (fmsmtp06.dlan.cinetic.de [172.20.5.172]) by fmmailgate02.web.de (Postfix) with ESMTP id 42AE714CA386B; Wed, 27 Jan 2010 16:26:22 +0100 (CET) Received: from [95.208.117.111] (helo=frosties.localdomain) by smtp06.web.de with asmtp (TLSv1:AES256-SHA:256) (WEB.DE 4.110 #314) id 1Na9mf-0000Yw-00; Wed, 27 Jan 2010 16:26:22 +0100 Received: from mrvn by frosties.localdomain with local (Exim 4.71) (envelope-from ) id 1Na9mf-0002PK-IQ; Wed, 27 Jan 2010 16:26:21 +0100 From: Goswin von Brederlow To: Christophe Papazian Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Alignment of data References: <5876A229-025E-47CE-B02F-4B00CD26BFAB@gmail.com> Date: Wed, 27 Jan 2010 16:26:21 +0100 In-Reply-To: <5876A229-025E-47CE-B02F-4B00CD26BFAB@gmail.com> (Christophe Papazian's message of "Wed, 27 Jan 2010 13:03:33 +0100") Message-ID: <87k4v336xu.fsf@frosties.localdomain> User-Agent: Gnus/5.110009 (No Gnus v0.9) XEmacs/21.4.22 (linux, no MULE) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: goswin-v-b@web.de X-Sender: goswin-v-b@web.de X-Provags-ID: V01U2FsdGVkX1/X/ItUtu8x8oWnGvXHelAK0/9zw3PoJeWmChI9 7Mk/vyx5to+R+E2BIwdPqQ/lYXm5a44w/pcdIiOAQpco4hoI0l UjWBgQcCg= X-Spam: no; 0.00; alignment:01 christophe:01 christophe:01 ocaml:01 alignment:01 ocamlopt:01 aligned:01 aligned:01 ocamlopt:01 stubs:01 printf:01 printf:01 pointer:01 integers:01 arrays:01 Christophe Papazian writes: > Dear users and developers of OCAML, > > I am working on some ppc architecture, and I realize that I have a > (very) big slowdown due to bad alignment of data by ocamlopt. I need > to have my data aligned in memory depending of the size of the data : > floats are to be aligned on 8 bytes, int on 4 bytes, etc.... > BUT, after verification, I remark that ocamlopt doesn't align as I > need. I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what > I thought, and doesn't change anything for my needs. Is there ANY way > to obtain what I need easily or at least quickly ? > > You can use the following code to test your alignment on your > architecture : > [ compile with ocamlopt align_stubs.c align.ml -o align ] > > ######### align.ml ######### > open Obj > > external get_addr : 'a -> int * string = "get_addr" > > let rec align acc r = > if r mod 2 = 1 then acc else align (acc*2) (r/2) > > let get_addr_print v = let a,b = get_addr v in Printf.printf "%6X %s > \n" a b; a That will cut of the upper bits of my address. Not important for alignment but bad practice. > let rec get_align acc = function > h::q as l -> get_align (acc lor get_addr_print l) q > | [] -> acc > > let f block s l = > let r = > if block then (* if the element is a block, consider it like a > pointer *) > List.fold_left (fun r e -> r lor get_addr_print e) 0 l > else get_align 0 l > in > Printf.printf "%s are aligned on %i bytes\n%!" s (align 1 r) > > let build_list v l = List.map (fun i -> Array.make i v) l > > let main = > f false "Chars" ['a';'b';'c';'d';'e']; > f false "Integers" [0;1;2;3;4]; > f true "Floats" [0.;1./.3.;2./.5.;3./.7.;4./.9.]; > f true "Int Arrays" (build_list 37 [3;4;5;6;7]); > f true "Float Arrays" (build_list (1./.3.) [2;3;4;5;6]); > f true "Other Float Arrays" [Array.make 1 max_float;Array.make 2 > 0.;Array.make 3 0.;Array.make 37 0.;Array.make 17 0.]; > > ####### align_stubs.c ######## > > #include > > #include > #include > #include > #include > > CAMLprim > value get_addr(value v) > { > CAMLparam1 (v); > char *repr = malloc(9); > value res = alloc_tuple(2); > Field(res,0) = Val_int((unsigned int) v); > sprintf(repr,"%8X", *((int*)v)); Again cutting of upper bits. I have a 64bit cpu so up to 16 hex digits for an address. > Field(res,1) = (caml_copy_string(repr)); > CAMLreturn(res); > } > > ######### Results ########## > > 1D8C0 C3 > 1D8CC C5 > 1D8D8 C7 > 1D8E4 C9 > 1D8F0 CB > Chars are aligned on 4 bytes > 1D878 1 > 1D884 3 > 1D890 5 > 1D89C 7 > 1D8A8 9 > Integers are aligned on 4 bytes > 1D85C 0 > 7612C 55555555 > 76114 9999999A > 760FC DB6DB6DB > 760E4 1C71C71C > Floats are aligned on 4 bytes > 74A2C 4B > 74A18 4B > 74A00 4B > 749E4 4B > 749C4 4B > Int Arrays are aligned on 4 bytes > 732C0 55555555 > 732A4 55555555 > 73280 55555555 > 73254 55555555 > 73220 55555555 > Float Arrays are aligned on 4 bytes > 71928 FFFFFFFF > 71940 0 > 71960 0 > 71988 0 > 71AC0 0 > Other Float Arrays are aligned on 8 bytes > > You can see the addresses in memory of each element of the lists and > it's internal representation (to check > if the memory pointer really point to the right value : you can even > see that 31 bit ocaml integer (and Chars) i have a C representation of > 2*i+1). > It seems that small values > are on the minor heap, and large values are on major heap. > Note that the last array is correctly aligned, but it's just a matter > of luck : If I change > something else before this line in my code, I usually get the last > array aligned on 4 bytes. > (But I can't find a way to obtain a float array aligned on 8 bytes > with the use of "build_list") Everything is aligned to a value. I don't think there is a special alloc call for the GC that gives you double alignement. Nothing in caml/alloc.h anyway. > Si if you have any idea of how to get floats and floats arrays aligned > on 8 bytes both on major and minor heap, please answer me ! > > Thank you very much > > Christophe You need to write a new function CAMLextern value caml_alloc_double_array (mlsize_t), or similar that ensures alignment on 8 byte for double even for 32bit systems. You should also check the CAMLextern value caml_copy_double (double); that it does the same. An alternative might be to use a Bigarray. MfG Goswin