* [Caml-list] Strings as arrays or lists... @ 2003-02-27 22:31 Oliver Bandel 2003-02-28 1:03 ` brogoff 0 siblings, 1 reply; 9+ messages in thread From: Oliver Bandel @ 2003-02-27 22:31 UTC (permalink / raw) To: caml-list Hello, in Haskell, strings are lists of chars. In Ocaml, strings seem to be array-like, but it's not possible to apply Array-functions on Strings, but nevertheless, the char's of a string are used in a very similar way, as arrays. The indexing is done with .(idx) in the one case and with .[idx] in the other case. Well, when strings would be lists of chars, List-commands like List.map and List.filter could be used. This would be more FPL-like. (And more convenient and IMHO more consistently.) But even if only the imperative Array-access would be possible (instead of the FPL-like Lists), it would be more convenient then to have such a string-type, which can't be used with the many libraries. Any suggestions on that topic? Ciao, Oliver ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-02-27 22:31 [Caml-list] Strings as arrays or lists Oliver Bandel @ 2003-02-28 1:03 ` brogoff 2003-03-02 18:34 ` Xavier Leroy 0 siblings, 1 reply; 9+ messages in thread From: brogoff @ 2003-02-28 1:03 UTC (permalink / raw) To: Oliver Bandel; +Cc: caml-list On Thu, 27 Feb 2003, Oliver Bandel wrote: > Hello, > > > in Haskell, strings are lists of chars. And what a horrible design decision that is! Even the other lazy language (Clean) doesn't screw this one up, and last I looked at it had strings as unboxed character arrays. > In Ocaml, strings seem to be array-like, > but it's not possible to apply Array-functions > on Strings, but nevertheless, the char's of a > string are used in a very similar way, as arrays. Think of them as packed character arrays. > The indexing is done with .(idx) in the one case > and with .[idx] in the other case. OK, so some of us will be much happier if (when?) OCaml implements extensional polymorphism and we can treat all array like thingies uniformly. > Well, when strings would be lists of chars, > List-commands like List.map and List.filter > could be used. Oh really? And what do we do when we want to use List.map on a long string? On my machine, Sys.max_string_length is 16_777_211. List.map will crap out on a much smaller list. > This would be more FPL-like. (And more convenient > and IMHO more consistently.) It would also be way slower, and restrict strings to much smaller lengths. The length restriction can (and should!) be fixed by extending the set of functions which are tail recursive, as has been discussed on this list in the last month. However, I think array like representations for strings make more sense, and if "FPL" means "handles arrays poorly" then I think the flaw is in FPLs. > But even if only the imperative Array-access would > be possible (instead of the FPL-like Lists), it would > be more convenient then to have such a string-type, > which can't be used with the many libraries. I can't quite parse this, but if you mean that you'd like to treat strings like arrays notationally, I agree. Agitate for extensional polymorphism. Wow, that sounds even better than "eschew obfuscation"! > Any suggestions on that topic? There is no one string representation which will satisfy everyone. I prefer the built in one to be array based, like in OCaml. You can easily write a function like this let explode s = let rec exap n l = if n < 0 then l else exap (n - 1) ((String.get s n)::l) in exap (String.length s - 1) [] to get the char list you want. You couldn't get the packed char array from a char list if it wasn't built in. -- Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-02-28 1:03 ` brogoff @ 2003-03-02 18:34 ` Xavier Leroy 2003-03-02 19:03 ` Alain.Frisch ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Xavier Leroy @ 2003-03-02 18:34 UTC (permalink / raw) To: brogoff; +Cc: Oliver Bandel, caml-list > > in Haskell, strings are lists of chars. > > And what a horrible design decision that is! Agreed. Well, it's a great way to multiply the memory requirements for your strings by a factor of 12 (on 32-bit platforms) or 24 (on 64-bit platforms), while at the same time losing constant-time indexing :-) Actually, the list representation of strings is so repugnant that I don't even want to include "explode" and "implode" coercions between string and char list in the standard library. A standard library should steer users away from algorithmically-inefficient code. By not having implode and explode in the library, I hope OCaml programmers will come to the realization that the proper way to operate on strings is either via block operations (the String module, regexps, etc), or by recursion on integer indices. - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-03-02 18:34 ` Xavier Leroy @ 2003-03-02 19:03 ` Alain.Frisch 2003-03-03 8:50 ` Luc Maranget 2003-03-04 2:49 ` Eric C. Cooper 2 siblings, 0 replies; 9+ messages in thread From: Alain.Frisch @ 2003-03-02 19:03 UTC (permalink / raw) To: Xavier Leroy; +Cc: brogoff, Oliver Bandel, caml-list On Sun, 2 Mar 2003, Xavier Leroy wrote: > > > in Haskell, strings are lists of chars. > > > > And what a horrible design decision that is! > > Agreed. Well, it's a great way to multiply the memory requirements > for your strings by a factor of 12 (on 32-bit platforms) or 24 (on > 64-bit platforms), while at the same time losing constant-time > indexing :-) Not necessarily. Considering strings as lists of chars is a design decision about the semantics of the language, not its implementation. For instance in CDuce, we choosed to define the type String as [Char*], that is, homogeneous sequences of (Unicode) characters. The idea is that we want to have characters and elements at the same level inside XML elements (the other solution is to have elements and strings; but when you concatenate two sequences of elements-or-strings, you want to perform implicit concatenation of strings at the middle to avoid two consecutive strings). For the implementation, we introduce a compact representation of strings at runtime, namely a pointer to a segment of a buffer (OCaml string) + a "continuation" (which can be a representation of the same kind, or a real list cell). When a pattern tries to deconstruct such a representation to see it as list cell (head,tail), the runtime system extracts the first character from the buffer and compute a pointer to the next character (depending on the internal Unicode encoding of the buffer). When a pattern extracts a consecutive subsequence (substrings) - patterns in CDuce are roughly regular expressions - it actually returns a pointer to a subsegment of the buffer. Concatenation of sequences is computed lazily, to avoid quadractic behavior when appending elements one by one at the end. Basically, in many situations, we avoid space overhead and keep a reasonable time overhead. I'm not advocating this kind of techniques for a fully general language such as OCaml; I just wanted to defend the "horrible" design for specific applications... -- Alain ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-03-02 18:34 ` Xavier Leroy 2003-03-02 19:03 ` Alain.Frisch @ 2003-03-03 8:50 ` Luc Maranget 2003-03-03 17:12 ` brogoff 2003-03-04 2:49 ` Eric C. Cooper 2 siblings, 1 reply; 9+ messages in thread From: Luc Maranget @ 2003-03-03 8:50 UTC (permalink / raw) To: Xavier Leroy; +Cc: brogoff, Oliver Bandel, caml-list > > > > in Haskell, strings are lists of chars. > > > > And what a horrible design decision that is! > > Agreed. Well, it's a great way to multiply the memory requirements > for your strings by a factor of 12 (on 32-bit platforms) or 24 (on > 64-bit platforms), while at the same time losing constant-time > indexing :-) > > Actually, the list representation of strings is so repugnant that I > don't even want to include "explode" and "implode" coercions between > string and char list in the standard library. A standard library > should steer users away from algorithmically-inefficient code. By not > having implode and explode in the library, I hope OCaml programmers > will come to the realization that the proper way to operate on strings > is either via block operations (the String module, regexps, etc), or > by recursion on integer indices. > > - Xavier Leroy > Xavier is right, of course. However, in a lazy context, seeing strings as list of chars has some advantages. This is not relevant to Caml anyway. --Luc ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-03-03 8:50 ` Luc Maranget @ 2003-03-03 17:12 ` brogoff 2003-03-03 17:40 ` Diego Olivier Fernandez Pons 0 siblings, 1 reply; 9+ messages in thread From: brogoff @ 2003-03-03 17:12 UTC (permalink / raw) To: Luc Maranget; +Cc: Xavier Leroy, Oliver Bandel, caml-list Yes, Xavier is right here, and I apologize for posting an "explode" function on this list. I'll appeal for leniency, mentioning that I didn't post "implode", and that I only posted the former function to demonstrate that it could be done, and to point out that one of the many reasons strings as char lists is wrong as a basic type is that you get an abstraction inversion. As far as char lists being somewhat advantageous in a lazy language, well, I won't start a flamewar as to whether laziness as the default is a good design decision (oh hell, I'll admit, I think it isn't) but I'll repeat my observation that in the Clean language, also lazy by default like Haskell, strings are unboxed character arrays. Back to the topic which interests us, OCaml. One thing I'd like to see in an extended library is a fairly rich substring library, perhaps like the one in the SML Basis. I have the beginnings of one, and I also ported the SML one from Moscow ML to OCaml. It can certainly be made richer, the first improvement on my list being an integration with a regexp package. I'm sure there are numerous ideas just for string libraries, and that we could fill an entire mailing list just with those. Ropes (a binary tree representation for applicative "big strings") and extended character sets (I guess Camomile is doing that now?) are my favorites. -- Brian On Mon, 3 Mar 2003, Luc Maranget wrote: > > > > in Haskell, strings are lists of chars. > > > > > > And what a horrible design decision that is! > > > > Agreed. Well, it's a great way to multiply the memory requirements > > for your strings by a factor of 12 (on 32-bit platforms) or 24 (on > > 64-bit platforms), while at the same time losing constant-time > > indexing :-) > > > > Actually, the list representation of strings is so repugnant that I > > don't even want to include "explode" and "implode" coercions between > > string and char list in the standard library. A standard library > > should steer users away from algorithmically-inefficient code. By not > > having implode and explode in the library, I hope OCaml programmers > > will come to the realization that the proper way to operate on strings > > is either via block operations (the String module, regexps, etc), or > > by recursion on integer indices. > > > > - Xavier Leroy > > > > > Xavier is right, of course. > However, in a lazy context, seeing strings as list of chars has some > advantages. This is not relevant to Caml anyway. > > --Luc > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-03-03 17:12 ` brogoff @ 2003-03-03 17:40 ` Diego Olivier Fernandez Pons 0 siblings, 0 replies; 9+ messages in thread From: Diego Olivier Fernandez Pons @ 2003-03-03 17:40 UTC (permalink / raw) To: caml-list Bonjour, Brian Rogoff wrote : > As far as char lists being somewhat advantageous in a lazy language, > well, I won't start a flamewar as to whether laziness as the default > is a good design decision (oh hell, I'll admit, I think it isn't) > but I'll repeat my observation that in the Clean language, also lazy > by default like Haskell, strings are unboxed character arrays. Caml is a 'most of the time strict but lazy when you need' language whereas Haskell is a 'most of the time lazy but strict when you need' one. And Caml already provides support for lazy lists by means of the Stream module, which is pretty good and convenient. > I'm sure there are numerous ideas just for string libraries, and > that we could fill an entire mailing list just with those. Ropes (a > binary tree representation for applicative "big strings") and > extended character sets (I guess Camomile is doing that now?) are my > favorites. Hum... Fast catenable arrays are the subject of Ralf Hinze last paper 'bootstrapping one-side flexible arrays'. He uses weight balanced trees of imperative arrays. It also makes a good representation for (precomputed) streams. Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-03-02 18:34 ` Xavier Leroy 2003-03-02 19:03 ` Alain.Frisch 2003-03-03 8:50 ` Luc Maranget @ 2003-03-04 2:49 ` Eric C. Cooper 2003-03-04 8:29 ` Fabrice Le Fessant 2 siblings, 1 reply; 9+ messages in thread From: Eric C. Cooper @ 2003-03-04 2:49 UTC (permalink / raw) To: caml-list On Sun, Mar 02, 2003 at 07:34:37PM +0100, Xavier Leroy wrote: > Actually, the list representation of strings is so repugnant that I > don't even want to include "explode" and "implode" coercions between > string and char list in the standard library. A standard library > should steer users away from algorithmically-inefficient code. By not > having implode and explode in the library, I hope OCaml programmers > will come to the realization that the proper way to operate on strings > is either via block operations (the String module, regexps, etc), or > by recursion on integer indices. I recently wrote some code that made use of char lists, and explode and implode came in handy. I needed to marshal a recursive datatype into a packet to be sent over a communication channel according to a protocol that imposed a specific format, including a length byte at the beginning and a checksum byte at the end. I could have made one pass over the data to compute the packet length, then a second pass marshaling it into a buffer. But it was very natural to just build up a list of bytes in a single traversal of the datatype. Then the length and checksum could easily be added to the beginning and the end, and the result written out. I used explode when I encountered string values at the leaves of the datatype. I didn't really need implode (I could just iterate output_byte over the final list), but it came in handy for dumping packets for debugging. I wasn't really using char lists as a representation of strings, but rather as a buffer-like data structure that just happened to need conversion to and from strings at certain points. As another poster pointed out, explode and implode are analogous to Array.to_list and Array.of_list, which don't seem to entice OCaml programmers down the path of algorithmic ineffiency. -- Eric C. Cooper e c c @ c m u . e d u ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] Strings as arrays or lists... 2003-03-04 2:49 ` Eric C. Cooper @ 2003-03-04 8:29 ` Fabrice Le Fessant 0 siblings, 0 replies; 9+ messages in thread From: Fabrice Le Fessant @ 2003-03-04 8:29 UTC (permalink / raw) To: Eric C. Cooper; +Cc: caml-list > I recently wrote some code that made use of char lists, and explode > and implode came in handy. I needed to marshal a recursive datatype > into a packet to be sent over a communication channel according to a > protocol that imposed a specific format, including a length byte at > the beginning and a checksum byte at the end. > > I could have made one pass over the data to compute the packet length, > then a second pass marshaling it into a buffer. But it was very > natural to just build up a list of bytes in a single traversal of the > datatype. Then the length and checksum could easily be added to the > beginning and the end, and the result written out. I had exactly the same code to write one year ago, and I simply built the packet inside a buffer, with a 0 length-field, changed it to a string, and filled the length-field just before sending, by changing the chars in place in the string. It only allocates the buffer once, and then the final string at the end, I cannot believe that using a list of chars can be more efficient, but sometimes strange things happen... - Fabrice ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-03-04 9:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-02-27 22:31 [Caml-list] Strings as arrays or lists Oliver Bandel 2003-02-28 1:03 ` brogoff 2003-03-02 18:34 ` Xavier Leroy 2003-03-02 19:03 ` Alain.Frisch 2003-03-03 8:50 ` Luc Maranget 2003-03-03 17:12 ` brogoff 2003-03-03 17:40 ` Diego Olivier Fernandez Pons 2003-03-04 2:49 ` Eric C. Cooper 2003-03-04 8:29 ` Fabrice Le Fessant
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox