* [Caml-list] File synchronization implementation(s) in OCaml? @ 2018-02-08 3:05 Evgeny Roubinchtein 2018-02-08 7:24 ` Malcolm Matalka ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Evgeny Roubinchtein @ 2018-02-08 3:05 UTC (permalink / raw) To: OCaml Mailing List [-- Attachment #1: Type: text/plain, Size: 782 bytes --] Dear OCaml users and developers, Do you have advice on: 1. Practical file synchronization algorithms. Rsync is the low bar for my purposes here, i.e., I don't want anything that performs worse than rsync in practice, but I am wondering if there is a way to do better. My completely uninformed attempt at searching the literature turned up this paper: http://engineering.nyu.edu/~suel/papers/recon.pdf, but I don't know anything about the area, so I am afraid that I don't even know what I don't know about the subject :-). I am also aware that Unison has an implementation of an rsync-like algorithm, but I don't know much more than that about that implementation. 2. Existing implementation(s) of said algorithms in OCaml. Thank you in advance! -- Best, Evgeny ("Zhenya") [-- Attachment #2: Type: text/html, Size: 1009 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] File synchronization implementation(s) in OCaml? 2018-02-08 3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein @ 2018-02-08 7:24 ` Malcolm Matalka 2018-02-08 8:01 ` Francois BERENGER 2018-02-08 13:11 ` Cedric Cellier 2 siblings, 0 replies; 7+ messages in thread From: Malcolm Matalka @ 2018-02-08 7:24 UTC (permalink / raw) To: Evgeny Roubinchtein; +Cc: OCaml Mailing List Check out Unison https://www.cis.upenn.edu/~bcpierce/unison/ Evgeny Roubinchtein <zhenya1007@gmail.com> writes: > Dear OCaml users and developers, > > Do you have advice on: > > 1. Practical file synchronization algorithms. Rsync is the low bar for my > purposes here, i.e., I don't want anything that performs worse than rsync > in practice, but I am wondering if there is a way to do better. My > completely uninformed attempt at searching the literature turned up this > paper: http://engineering.nyu.edu/~suel/papers/recon.pdf, but I don't know > anything about the area, so I am afraid that I don't even know what I don't > know about the subject :-). I am also aware that Unison has an > implementation of an rsync-like algorithm, but I don't know much more than > that about that implementation. > > 2. Existing implementation(s) of said algorithms in OCaml. > > Thank you in advance! > > -- > Best, > Evgeny ("Zhenya") ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] File synchronization implementation(s) in OCaml? 2018-02-08 3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein 2018-02-08 7:24 ` Malcolm Matalka @ 2018-02-08 8:01 ` Francois BERENGER 2018-02-08 13:11 ` Cedric Cellier 2 siblings, 0 replies; 7+ messages in thread From: Francois BERENGER @ 2018-02-08 8:01 UTC (permalink / raw) To: caml-list On 02/08/2018 12:05 PM, Evgeny Roubinchtein wrote: > Dear OCaml users and developers, > > Do you have advice on: > > 1. Practical file synchronization algorithms. Rsync is the low bar for > my purposes here, i.e., I don't want anything that performs worse than > rsync in practice, but I am wondering if there is a way to do better. > My completely uninformed attempt at searching the literature turned up > this paper: http://engineering.nyu.edu/~suel/papers/recon.pdf, but I > don't know anything about the area, so I am afraid that I don't even > know what I don't know about the subject :-). I am also aware that > Unison has an implementation of an rsync-like algorithm, but I don't > know much more than that about that implementation. The algorithm behind the tarsnap service looks cool. I think it works even on binary files. https://www.tarsnap.com/ I think the exact algorithm is given in Colin Percival's thesis: https://ora.ox.ac.uk/objects/uuid:4f0d53cc-fb9f-4246-a835-3c8734eba735/datastreams/THESIS01 > 2. Existing implementation(s) of said algorithms in OCaml. > > Thank you in advance! > > -- > Best, > Evgeny ("Zhenya") ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] File synchronization implementation(s) in OCaml? 2018-02-08 3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein 2018-02-08 7:24 ` Malcolm Matalka 2018-02-08 8:01 ` Francois BERENGER @ 2018-02-08 13:11 ` Cedric Cellier 2018-02-08 14:44 ` Yaron Minsky 2 siblings, 1 reply; 7+ messages in thread From: Cedric Cellier @ 2018-02-08 13:11 UTC (permalink / raw) To: OCaml Mailing List -[ Wed, Feb 07, 2018 at 07:05:15PM -0800, Evgeny Roubinchtein ]---- > I don't want anything that performs worse than rsync in practice, It would be interesting to know how you measure performance as one could think of many metrics: - speed on different data - speed on similar data - reliability in face of simultaneous synchronizations - reliability in case of bad network - usage of resources - confidentiality - ...? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] File synchronization implementation(s) in OCaml? 2018-02-08 13:11 ` Cedric Cellier @ 2018-02-08 14:44 ` Yaron Minsky 2018-02-08 15:42 ` Evgeny Roubinchtein 0 siblings, 1 reply; 7+ messages in thread From: Yaron Minsky @ 2018-02-08 14:44 UTC (permalink / raw) To: Cedric Cellier; +Cc: OCaml Mailing List Ha! The paper you linked builds off of an old paper I wrote: http://cis.poly.edu/westlab/papers/ref/practical.pdf There is in fact a full implementation of the algorithms in this paper in OCaml, as part of the SKS system that I wrote many years ago. https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Home I'm not especially proud of the code, but it does work... y On Thu, Feb 8, 2018 at 8:11 AM, Cedric Cellier <rixed@happyleptic.org> wrote: > -[ Wed, Feb 07, 2018 at 07:05:15PM -0800, Evgeny Roubinchtein ]---- >> I don't want anything that performs worse than rsync in practice, > > It would be interesting to know how you measure performance as one could > think of many metrics: > > - speed on different data > - speed on similar data > - reliability in face of simultaneous synchronizations > - reliability in case of bad network > - usage of resources > - confidentiality > - ...? > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] File synchronization implementation(s) in OCaml? 2018-02-08 14:44 ` Yaron Minsky @ 2018-02-08 15:42 ` Evgeny Roubinchtein 2018-02-08 15:50 ` Hendrik Boom 0 siblings, 1 reply; 7+ messages in thread From: Evgeny Roubinchtein @ 2018-02-08 15:42 UTC (permalink / raw) To: OCaml Mailing List [-- Attachment #1: Type: text/plain, Size: 3651 bytes --] Thank you everyone for all the responses and pointers to artefacts and papers. Cedric, that is an excellent point; thank you. My current intended use is having the user edit file(s) on host A using a text editor, and, when the user saves a file, having those edits reflected as quickly as possible in the corresponding file on host B. So, my primary measure of performance is the speed of update: I want the time between the events "the user, who runs the editor on host A, has issued a command to the editor to save the file" and "the contents of the file the user has just saved on host A and the corresponding file on host B is identical" to be as low as feasible (there is a mapping between file paths on hosts A and B). Sometimes, the user will create a new file on host A; then the file's contents needs to be put as quickly as possible into the corresponding location on host B. One other consideration is that occasionally the user may have their text editor save a number of files in quick (at least in human scale) succession: for example, the user issues "M-x compile" in Emacs, and Emacs offers to save all modified buffers before running the compilation: in that case, the total time to propagate the changes to all modified file(s) should be as low as possible. So, to summarize: 1. One-way synchronization is acceptable (I may find out otherwise with experience, but, for now, I am willing to make that assumption) 2. It is always known which file(s) have been modified. 3. It is probably feasible to plumb through the information about what part(s) of of a file were modified for each file from the text editor to the updater, if that helps with the speed of the update. The editor usually "knows" what has changed, because it needs to be able to undo the changes. 4. The relevant metric is the speed of the update. 5. Sometimes changes to more than one file may be saved in quick (human scale) succession. The relevant metric is still the speed of update to all files. I apologize for the somewhat long-winded description of the metric I care about. I do hope that clarifies it. -- Best, Evgeny ("Zhenya") On Thu, Feb 8, 2018 at 6:44 AM, Yaron Minsky <yminsky@janestreet.com> wrote: > Ha! The paper you linked builds off of an old paper I wrote: > > http://cis.poly.edu/westlab/papers/ref/practical.pdf > > There is in fact a full implementation of the algorithms in this paper > in OCaml, as part of the SKS system that I wrote many years ago. > > https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Home > > I'm not especially proud of the code, but it does work... > > y > > On Thu, Feb 8, 2018 at 8:11 AM, Cedric Cellier <rixed@happyleptic.org> > wrote: > > -[ Wed, Feb 07, 2018 at 07:05:15PM -0800, Evgeny Roubinchtein ]---- > >> I don't want anything that performs worse than rsync in practice, > > > > It would be interesting to know how you measure performance as one could > > think of many metrics: > > > > - speed on different data > > - speed on similar data > > - reliability in face of simultaneous synchronizations > > - reliability in case of bad network > > - usage of resources > > - confidentiality > > - ...? > > > > > > -- > > Caml-list mailing list. Subscription management and archives: > > https://sympa.inria.fr/sympa/arc/caml-list > > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > [-- Attachment #2: Type: text/html, Size: 5303 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Caml-list] File synchronization implementation(s) in OCaml? 2018-02-08 15:42 ` Evgeny Roubinchtein @ 2018-02-08 15:50 ` Hendrik Boom 0 siblings, 0 replies; 7+ messages in thread From: Hendrik Boom @ 2018-02-08 15:50 UTC (permalink / raw) To: caml-list On Thu, Feb 08, 2018 at 07:42:57AM -0800, Evgeny Roubinchtein wrote: > Thank you everyone for all the responses and pointers to artefacts and > papers. > > Cedric, that is an excellent point; thank you. My current intended use is > having the user edit file(s) on host A using a text editor, and, when the > user saves a file, having those edits reflected as quickly as possible in > the corresponding file on host B. So, my primary measure of performance is > the speed of update: I want the time between the events "the user, who runs > the editor on host A, has issued a command to the editor to save the file" > and "the contents of the file the user has just saved on host A and the > corresponding file on host B is identical" to be as low as feasible (there > is a mapping between file paths on hosts A and B). Sometimes, the user > will create a new file on host A; then the file's contents needs to be put > as quickly as possible into the corresponding location on host B. One > other consideration is that occasionally the user may have their text > editor save a number of files in quick (at least in human scale) > succession: for example, the user issues "M-x compile" in Emacs, and Emacs > offers to save all modified buffers before running the compilation: in that > case, the total time to propagate the changes to all modified file(s) > should be as low as possible. So, to summarize: > > 1. One-way synchronization is acceptable (I may find out otherwise with > experience, but, for now, I am willing to make that assumption) > 2. It is always known which file(s) have been modified. > 3. It is probably feasible to plumb through the information about what > part(s) of of a file were modified for each file from the text editor to > the updater, if that helps with the speed of the update. The editor > usually "knows" what has changed, because it needs to be able to undo the > changes. > 4. The relevant metric is the speed of the update. > 5. Sometimes changes to more than one file may be saved in quick (human > scale) succession. The relevant metric is still the speed of update to all > files. Unless your requirements are very different from mine, I suspct you want distributed revision management. The system I use for that is monotone. Once set up it's a lot easier to use than git. -- hendrik ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-08 15:50 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-08 3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein 2018-02-08 7:24 ` Malcolm Matalka 2018-02-08 8:01 ` Francois BERENGER 2018-02-08 13:11 ` Cedric Cellier 2018-02-08 14:44 ` Yaron Minsky 2018-02-08 15:42 ` Evgeny Roubinchtein 2018-02-08 15:50 ` Hendrik Boom
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox