* Maximum non-constant constructors @ 2005-03-16 19:51 Tom Hawkins 2005-03-16 20:42 ` [Caml-list] " Alain Frisch 0 siblings, 1 reply; 11+ messages in thread From: Tom Hawkins @ 2005-03-16 19:51 UTC (permalink / raw) To: caml-list I just discovered another limitation: the maximum number of non-constant constructors is 246. My modified OCaml parser generator (ocamlyacc with ints -- see previous post) is producing a parser that consumes 342 different tokens types. Are there any ways to bypass this limitation? -Tom ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-16 19:51 Maximum non-constant constructors Tom Hawkins @ 2005-03-16 20:42 ` Alain Frisch 2005-03-16 23:22 ` Tom Hawkins 2005-03-16 23:32 ` [Caml-list] " Richard Jones 0 siblings, 2 replies; 11+ messages in thread From: Alain Frisch @ 2005-03-16 20:42 UTC (permalink / raw) To: Tom Hawkins; +Cc: caml-list Tom Hawkins wrote: > I just discovered another limitation: the maximum number of non-constant > constructors is 246. > > My modified OCaml parser generator (ocamlyacc with ints -- see previous > post) is producing a parser that consumes 342 different tokens types. > > Are there any ways to bypass this limitation? Polymorphic variants. -- Alain ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-16 20:42 ` [Caml-list] " Alain Frisch @ 2005-03-16 23:22 ` Tom Hawkins 2005-03-17 15:23 ` Stefan Monnier 2005-03-16 23:32 ` [Caml-list] " Richard Jones 1 sibling, 1 reply; 11+ messages in thread From: Tom Hawkins @ 2005-03-16 23:22 UTC (permalink / raw) To: caml-list Alain Frisch wrote: > Tom Hawkins wrote: > >> I just discovered another limitation: the maximum number of >> non-constant constructors is 246. >> >> My modified OCaml parser generator (ocamlyacc with ints -- see >> previous post) is producing a parser that consumes 342 different >> tokens types. >> >> Are there any ways to bypass this limitation? > > > Polymorphic variants. Good idea, but ocamlyacc does not accept `Some_token as valid syntax. I'm starting to think I should invest some time developing a new parser generator for OCaml. -Tom > > -- Alain > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Maximum non-constant constructors 2005-03-16 23:22 ` Tom Hawkins @ 2005-03-17 15:23 ` Stefan Monnier 0 siblings, 0 replies; 11+ messages in thread From: Stefan Monnier @ 2005-03-17 15:23 UTC (permalink / raw) To: caml-list > I'm starting to think I should invest some time developing a new parser > generator for OCaml. BTW, has anyone ported SML/NJ's ml-yacc to OCaml? I really like its automatic error recovery. Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-16 20:42 ` [Caml-list] " Alain Frisch 2005-03-16 23:22 ` Tom Hawkins @ 2005-03-16 23:32 ` Richard Jones 2005-03-17 12:36 ` Jean-Christophe Filliatre 1 sibling, 1 reply; 11+ messages in thread From: Richard Jones @ 2005-03-16 23:32 UTC (permalink / raw) To: caml-list On Wed, Mar 16, 2005 at 09:42:35PM +0100, Alain Frisch wrote: > Polymorphic variants. Has anyone ever seen (in real life) a collision in the hash function which encodes polymorphic variants? Just wondering ... It seems like it could occur. Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-16 23:32 ` [Caml-list] " Richard Jones @ 2005-03-17 12:36 ` Jean-Christophe Filliatre 2005-03-17 14:00 ` Eric Cooper 0 siblings, 1 reply; 11+ messages in thread From: Jean-Christophe Filliatre @ 2005-03-17 12:36 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list Richard Jones writes: > > Has anyone ever seen (in real life) a collision in the hash function > which encodes polymorphic variants? Just wondering ... It seems like > it could occur. Yes it could occur, but there is a check a link time for such collisions. -- Jean-Christophe (http://www.lri.fr/~filliatr) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-17 12:36 ` Jean-Christophe Filliatre @ 2005-03-17 14:00 ` Eric Cooper 2005-03-17 17:32 ` Marcin 'Qrczak' Kowalczyk 2005-03-17 17:55 ` Jon Harrop 0 siblings, 2 replies; 11+ messages in thread From: Eric Cooper @ 2005-03-17 14:00 UTC (permalink / raw) To: caml-list On Thu, Mar 17, 2005 at 01:36:35PM +0100, Jean-Christophe Filliatre wrote: > > Richard Jones writes: > > > > Has anyone ever seen (in real life) a collision in the hash function > > which encodes polymorphic variants? Just wondering ... It seems like > > it could occur. > > Yes it could occur, but there is a check a link time for such collisions. Out of curiousity, I wrote a short program to enumerate possible collisions. (If you examine the hash_variant function in typing/btype.ml, it's clear that any base-223 representation of a multiple of 2^31 in which the "digits" are legal identifier characters will hash to zero, and will therefore be an "invisible prefix".) For example, for all strings XXX, the variants `XXX and `zyctRecXXX collide. Here's a small sampling of other invisible prefixes: CibPXd UMEDdm d1usNc hS1P1' jagJhn oZshTt Atmtemb CAoStes DHobutv PeoQMeo SQufoxX alzzdgn dRtEXEl eeAnNdc glMavfi stYKKKs vbasThr My guess is that none of them are likely to be chosen by humans, but might occur in program-generated code. -- Eric Cooper e c c @ c m u . e d u ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-17 14:00 ` Eric Cooper @ 2005-03-17 17:32 ` Marcin 'Qrczak' Kowalczyk 2005-03-17 18:27 ` Richard Jones 2005-03-17 17:55 ` Jon Harrop 1 sibling, 1 reply; 11+ messages in thread From: Marcin 'Qrczak' Kowalczyk @ 2005-03-17 17:32 UTC (permalink / raw) To: caml-list Eric Cooper <ecc@cmu.edu> writes: > (If you examine the hash_variant function in typing/btype.ml, it's > clear that any base-223 representation of a multiple of 2^31 in > which the "digits" are legal identifier characters will hash to > zero, and will therefore be an "invisible prefix".) Beware of the birthday paradox: the probability of finding two values with the same hash is much larger than the probability of finding a single value with the given hash (a square root of the previous one). -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-17 17:32 ` Marcin 'Qrczak' Kowalczyk @ 2005-03-17 18:27 ` Richard Jones 0 siblings, 0 replies; 11+ messages in thread From: Richard Jones @ 2005-03-17 18:27 UTC (permalink / raw) Cc: caml-list On Thu, Mar 17, 2005 at 06:32:06PM +0100, Marcin 'Qrczak' Kowalczyk wrote: > Beware of the birthday paradox: the probability of finding two values > with the same hash is much larger than the probability of finding a > single value with the given hash (a square root of the previous one). Yes, I was thinking about this too: http://en.wikipedia.org/wiki/OCaml#Code_examples (Good excuse to get rid of the terrible "99 bottles of beer" example too). Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-17 14:00 ` Eric Cooper 2005-03-17 17:32 ` Marcin 'Qrczak' Kowalczyk @ 2005-03-17 17:55 ` Jon Harrop 2005-03-19 2:28 ` Jacques Garrigue 1 sibling, 1 reply; 11+ messages in thread From: Jon Harrop @ 2005-03-17 17:55 UTC (permalink / raw) To: caml-list On Thursday 17 March 2005 14:00, Eric Cooper wrote: > For example, for all strings XXX, the variants `XXX and `zyctRecXXX > collide. So they do: # type a = [ `ABC | `zyctRecABC ];; Variant tags `ABC and `zyctRecABC have same hash value. Change one of them. # > My guess is that none of them are likely to be chosen by > humans, but might occur in program-generated code. Good point! I think this justifies the existence of a library function to compare the hashes of string names of polymorphic variants. Does such a function exist? -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. Objective CAML for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Caml-list] Maximum non-constant constructors 2005-03-17 17:55 ` Jon Harrop @ 2005-03-19 2:28 ` Jacques Garrigue 0 siblings, 0 replies; 11+ messages in thread From: Jacques Garrigue @ 2005-03-19 2:28 UTC (permalink / raw) To: jon; +Cc: caml-list From: Jon Harrop <jon@ffconsultancy.com> > On Thursday 17 March 2005 14:00, Eric Cooper wrote: > > For example, for all strings XXX, the variants `XXX and `zyctRecXXX > > collide. > > My guess is that none of them are likely to be chosen by > > humans, but might occur in program-generated code. > > Good point! I think this justifies the existence of a library function to > compare the hashes of string names of polymorphic variants. Does such a > function exist? Here it is: let hash_variant s = let accu = ref 0 in for i = 0 to String.length s - 1 do accu := 223 * !accu + Char.code s.[i] done; (* reduce to 31 bits *) accu := !accu land (1 lsl 31 - 1); (* make it signed for 64 bits architectures *) if !accu > 0x3FFFFFFF then !accu - (1 lsl 31) else !accu It is also available in the C runtime (where it is useful for interfacing) but if you really need it in your program just copy the above code. To give you an idea of the probability of conflict, here are the conclicts I found in a 235882 word dictionary (/usr/share/dict/web2 on FreeBSD) Tag `Cretacic conflicts with `cornigerous Tag `gedackt conflicts with `aquicolous Tag `myeloencephalitis conflicts with `adequative Tag `Nemorensian conflicts with `condor Tag `nonrioter conflicts with `anematosis Tag `omphaloma conflicts with `crimelessness Tag `persecute conflicts with `paraenetic Tag `soredioid conflicts with `genitorial Tag `subpopulation conflicts with `oxyquinone Tag `sympathy conflicts with `herbman Tag `trophaeum conflicts with `prepossession Tag `unbreaded conflicts with `neback Tag `undragooned conflicts with `nuisance Tag `unlistened conflicts with `disturb Tag `veratrine conflicts with `absolutely You will get different conflict if you change the capitalization, but not more. So much for the birthday paradox: variants fully use 31 bits, so that the probability only gets more than 0.5 when you have more than 32000 constructors in the same type... Of course this is assuming a perfect distribution, but as the above dictionary shows, reality is close to that. By the way, the check is not at link time, as was stated in another message, but at compile time. It is the typing of variants itself that guarantees that no such conflict will go undetected. So you will be warned early enough. The same hash function is also used for method names (with the same compile-time check), but fortunately it is hard to imagine an object with more than 10000 methods. Jacques Garrigue ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-03-19 2:28 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-03-16 19:51 Maximum non-constant constructors Tom Hawkins 2005-03-16 20:42 ` [Caml-list] " Alain Frisch 2005-03-16 23:22 ` Tom Hawkins 2005-03-17 15:23 ` Stefan Monnier 2005-03-16 23:32 ` [Caml-list] " Richard Jones 2005-03-17 12:36 ` Jean-Christophe Filliatre 2005-03-17 14:00 ` Eric Cooper 2005-03-17 17:32 ` Marcin 'Qrczak' Kowalczyk 2005-03-17 18:27 ` Richard Jones 2005-03-17 17:55 ` Jon Harrop 2005-03-19 2:28 ` Jacques Garrigue
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox