* [Caml-list] Announcing Sek, an efficient implementation of sequences @ 2020-04-04 10:11 François Pottier 2020-04-04 13:17 ` Gabriel Scherer 0 siblings, 1 reply; 7+ messages in thread From: François Pottier @ 2020-04-04 10:11 UTC (permalink / raw) To: caml users Fellow OCaml users, We are pleased to announce the first release of Sek, an OCaml library that offers an efficient implementation of sequences. The library offers both ephemeral (mutable) sequences and persistent (immutable) sequences, and offers constant-time conversions between these flavors. It supports all of the standard operations on stacks, queues, deques (e.g. push, pop at either end), catenable sequences (concat, split), and random access sequences (get, set). Data is stored internally in chunks (fixed-capacity arrays), which is why this data structure is known as a chunK SEquence. It is intended to achieve excellent time complexity and memory usage. This is an initial release. The library has not been tested in production, but has received extensive unit testing, via afl-fuzz and ocaml+afl -- which are remarkably effective tools, by the way! This is work in progress; more features, such as iterators, will be added in the future. To install Sek, just type opam update && opam install sek Documentation is at http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html Feedback is welcome! -- Arthur Charguéraud François Pottier with contributions by Émilie Guermeur ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences 2020-04-04 10:11 [Caml-list] Announcing Sek, an efficient implementation of sequences François Pottier @ 2020-04-04 13:17 ` Gabriel Scherer 2020-04-04 13:29 ` orbifx 2020-04-05 11:35 ` François Pottier 0 siblings, 2 replies; 7+ messages in thread From: Gabriel Scherer @ 2020-04-04 13:17 UTC (permalink / raw) To: François Pottier, Arthur Charguéraud; +Cc: caml users [-- Attachment #1: Type: text/plain, Size: 5623 bytes --] Looks very nice, thanks! Some feedback from reading the documentation. 1. gitlab.inria.fr makes it difficult for non-INRIA people to contribute (they cannot open a pull/merge request). It is a mistake to use it for publicly-distributed software. (In theory it would be fine if you never needed outside help, if you were the perfect maintainer whose opam, dune and whatever stuff is always up-to-date. In practice no one is.) 2. > Instead, one should first build an ephemeral sequence through a series of ephemeral push operations, > then convert it to a persistent sequence. This way, the construction cost is only *O(n+K)*. I would guess that it is O(T + K), or n has to be defined first. 3. The list of function complexities is hard to digest. Would it be possible to have, in addition or replacement, the complexity of each operation mentioned in the documentation of the operation? (This makes it easy to lookup the complexity of a given function) 4. I had never seen the trick of single-value functor parameters used as labelled arguments. It is interesting. Another approach would be to have a single Parameters module with one value for each runtime parameter. This would be syntactically lighter, and I cannot think of any downside right now. (One minor upside in general is that one can add new runtime parameters without changing the functor abstraction code / formal parameter list.) 5. > Furthermore, in a series of push operations, starting from an empty sequence, the amortized cost of one push operation is only *O(1)*. If I undestand, this sentence is in tension with the sentence mentioning a quadratic behavior up to T. Both are true, but they give contradictory advise on how to write your code. Maybe you could recall here that one should only take advantage of this property for sequences that are expected to be much larger than T? (In general there is little guidance in the documentation on how to set T, whereas there is a number given for K. For example, if I don't know the data size in advance, is it reasonable to use T=2? Maybe you could benchmark a good choice of T for some workload depending on the average size of your sequences (N=10^2, 10^4, 10^6), and let us know. 6. The natural mental model for snapshot_and_clear is a transfer of unique ownership (into shareable ownership). If this is the model I am using in my program, then performing any operation after the snapshot would be a programming mistake. Your _and_clear specification avoids any undefined or surprising/undocumented behavior in the case that happens, but it may still hide my programming mistake. Could there be a control-transfer version that "poisons" the ephemeral structure, so that any later operation on it raises an exception? On Sat, Apr 4, 2020 at 12:12 PM François Pottier <francois.pottier@inria.fr> wrote: > > Fellow OCaml users, > > We are pleased to announce the first release of Sek, an OCaml library that > offers an efficient implementation of sequences. > > The library offers both ephemeral (mutable) sequences and persistent > (immutable) sequences, and offers constant-time conversions between these > flavors. > > It supports all of the standard operations on stacks, queues, deques (e.g. > push, pop at either end), catenable sequences (concat, split), and random > access sequences (get, set). > > Data is stored internally in chunks (fixed-capacity arrays), > which is why this data structure is known as a chunK SEquence. > > It is intended to achieve excellent time complexity and memory usage. > > This is an initial release. The library has not been tested in production, > but has received extensive unit testing, via afl-fuzz and ocaml+afl -- > which are remarkably effective tools, by the way! > > This is work in progress; more features, such as iterators, will be added > in the future. > > To install Sek, just type > > opam update && opam install sek > > Documentation is at > > http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html > > Feedback is welcome! > > -- > Arthur Charguéraud > François Pottier > with contributions by Émilie Guermeur > On Sat, Apr 4, 2020 at 12:12 PM François Pottier <francois.pottier@inria.fr> wrote: > > Fellow OCaml users, > > We are pleased to announce the first release of Sek, an OCaml library that > offers an efficient implementation of sequences. > > The library offers both ephemeral (mutable) sequences and persistent > (immutable) sequences, and offers constant-time conversions between these > flavors. > > It supports all of the standard operations on stacks, queues, deques (e.g. > push, pop at either end), catenable sequences (concat, split), and random > access sequences (get, set). > > Data is stored internally in chunks (fixed-capacity arrays), > which is why this data structure is known as a chunK SEquence. > > It is intended to achieve excellent time complexity and memory usage. > > This is an initial release. The library has not been tested in production, > but has received extensive unit testing, via afl-fuzz and ocaml+afl -- > which are remarkably effective tools, by the way! > > This is work in progress; more features, such as iterators, will be added > in the future. > > To install Sek, just type > > opam update && opam install sek > > Documentation is at > > http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html > > Feedback is welcome! > > -- > Arthur Charguéraud > François Pottier > with contributions by Émilie Guermeur > [-- Attachment #2: Type: text/html, Size: 6903 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences 2020-04-04 13:17 ` Gabriel Scherer @ 2020-04-04 13:29 ` orbifx 2020-04-04 13:56 ` Gabriel Scherer 2020-04-05 11:35 ` François Pottier 1 sibling, 1 reply; 7+ messages in thread From: orbifx @ 2020-04-04 13:29 UTC (permalink / raw) To: caml-list On 04/04/2020 14:17, Gabriel Scherer wrote: > 1. gitlab.inria.fr <http://gitlab.inria.fr> makes it difficult for non-INRIA people to contribute (they cannot open a pull/merge request). Only for those who don't know how to contribute without "opening" a pull request. Which is otherwise as simple as sending a message with: here is my repo URL, please pull; or here is my patch attachment. > It is a mistake to use it for publicly-distributed software. I think not. I find it positively refreshing to see something outwith Gitlab's dominance. So it's a matter of opinion. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences 2020-04-04 13:29 ` orbifx @ 2020-04-04 13:56 ` Gabriel Scherer 2020-04-04 14:04 ` orbifx 2020-04-07 22:06 ` Gerd Stolpmann 0 siblings, 2 replies; 7+ messages in thread From: Gabriel Scherer @ 2020-04-04 13:56 UTC (permalink / raw) To: orbifx; +Cc: caml users [-- Attachment #1: Type: text/plain, Size: 2070 bytes --] I don't have strong opinion against mail-based contributions (I suspect they are perceived as less beginner-friendly by newer contributors who have grown up in the web-interface-based-world, but that's a discussion for another time). The problem here is that a select group of contributors can use a certain contribution procedure (here: Merge Requests), and external contributors have to use a different approach (email or issue-based), or ask permission to use the first-group's approach. In this setting the second approach is *objectively* second-class, and this difference objectively sucks. There are other places that host Gitlab instances without this restriction. The restriction is originally caused by design issues in Gitlab, which ties "sending a Merge Request" with "having a fork on the same server" (which paranoid sysadmins want to forbid to public users as it may create resource-consumption issues). Gitlab is working on making it possible to open a Merge Request by just sending a patch ( https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html#new-merge-request-by-email-core-only ), and maybe some other git-host server will rise to visibility with a better federation story ( https://sourcehut.org/ ?). (If you are interested in hosting a web platform for git hosting, I'm told that gitea ( https://gitea.io/ ) is much easier to deploy than Gitlab.) On Sat, Apr 4, 2020 at 3:30 PM orbifx <fox@orbitalfox.eu> wrote: > On 04/04/2020 14:17, Gabriel Scherer wrote: > > 1. gitlab.inria.fr <http://gitlab.inria.fr> makes it difficult for > non-INRIA people to contribute (they cannot open a pull/merge request). > > Only for those who don't know how to contribute without "opening" a pull > request. Which is otherwise as simple as sending a message with: here is my > repo URL, please pull; or here is my patch attachment. > > > It is a mistake to use it for publicly-distributed software. > > I think not. I find it positively refreshing to see something outwith > Gitlab's dominance. So it's a matter of opinion. > > [-- Attachment #2: Type: text/html, Size: 2870 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences 2020-04-04 13:56 ` Gabriel Scherer @ 2020-04-04 14:04 ` orbifx 2020-04-07 22:06 ` Gerd Stolpmann 1 sibling, 0 replies; 7+ messages in thread From: orbifx @ 2020-04-04 14:04 UTC (permalink / raw) To: Gabriel Scherer; +Cc: caml users On 04/04/2020 14:56, Gabriel Scherer wrote: > I don't have strong opinion against mail-based contributions [..] > There are other places that host Gitlab instances without this restriction.[..] The restriction is originally caused by design issues in Gitlab, Ow, I see what you mean now. > (If you are interested in hosting a web platform for git hosting, I'm told that gitea ( https://gitea.io/ ) is much easier to deploy than Gitlab.) I concur Gitea is easier install and lighter to run than Gitlab. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences 2020-04-04 13:56 ` Gabriel Scherer 2020-04-04 14:04 ` orbifx @ 2020-04-07 22:06 ` Gerd Stolpmann 1 sibling, 0 replies; 7+ messages in thread From: Gerd Stolpmann @ 2020-04-07 22:06 UTC (permalink / raw) To: caml-list [-- Attachment #1.1.1: Type: text/plain, Size: 1690 bytes --] Am 04.04.20 um 15:56 schrieb Gabriel Scherer: > (If you are interested in hosting a web platform for git hosting, I'm > told that gitea ( https://gitea.io/ ) is much easier to deploy than > Gitlab.) As I went through this in case of Gitlab: the deployment is very simple - they have a docker container. One command and it is running. The issue with a running a public Gitlab instance was that you need to invest some time to maintain it (it is abused for hosting ads of any kind), and that is why I stopped doing this. Gerd > On Sat, Apr 4, 2020 at 3:30 PM orbifx <fox@orbitalfox.eu > <mailto:fox@orbitalfox.eu>> wrote: > > On 04/04/2020 14:17, Gabriel Scherer wrote: > > 1. gitlab.inria.fr <http://gitlab.inria.fr> > <http://gitlab.inria.fr> makes it difficult for non-INRIA people > to contribute (they cannot open a pull/merge request). > > Only for those who don't know how to contribute without "opening" > a pull request. Which is otherwise as simple as sending a message > with: here is my repo URL, please pull; or here is my patch > attachment. > > > It is a mistake to use it for publicly-distributed software. > > I think not. I find it positively refreshing to see something > outwith Gitlab's dominance. So it's a matter of opinion. > -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de My OCaml site: http://www.camlcity.org Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de ------------------------------------------------------------ [-- Attachment #1.1.2: Type: text/html, Size: 3410 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences 2020-04-04 13:17 ` Gabriel Scherer 2020-04-04 13:29 ` orbifx @ 2020-04-05 11:35 ` François Pottier 1 sibling, 0 replies; 7+ messages in thread From: François Pottier @ 2020-04-05 11:35 UTC (permalink / raw) To: Gabriel Scherer, Arthur Charguéraud; +Cc: caml users Hi Gabriel, Thanks for your remarks; I have made a few improvements to the documentation based on your remarks. In particular, the default value of T is currently 64. On 04/04/2020 15:17, Gabriel Scherer wrote: > 6. Your _and_clear specification avoids any undefined or > surprising/undocumented behavior in the case that happens, but it may still > hide my programming mistake. This is true. We debated various options (described in NOTES.md) but chose the version whose specification is simplest (no undefined behavior, no exception). Because we would prefer to avoid offering many variants of each operation, we will probably keep things this way. -- François Pottier francois.pottier@inria.fr http://cambium.inria.fr/~fpottier/ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-07 22:07 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-04 10:11 [Caml-list] Announcing Sek, an efficient implementation of sequences François Pottier 2020-04-04 13:17 ` Gabriel Scherer 2020-04-04 13:29 ` orbifx 2020-04-04 13:56 ` Gabriel Scherer 2020-04-04 14:04 ` orbifx 2020-04-07 22:06 ` Gerd Stolpmann 2020-04-05 11:35 ` François Pottier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox