* Asynchronous IO programming in OCaml @ 2010-10-24 10:34 Jon Harrop 2010-10-24 12:51 ` [Caml-list] " philippe ` (3 more replies) 0 siblings, 4 replies; 26+ messages in thread From: Jon Harrop @ 2010-10-24 10:34 UTC (permalink / raw) To: caml-list Is there a tutorial on using something like LWT for asynchronous programming in OCaml? I'm looking for an example like an echo server that handles clients concurrently without blocking threads, so it can handle thousands of clients without significant performance degradation. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 10:34 Asynchronous IO programming in OCaml Jon Harrop @ 2010-10-24 12:51 ` philippe 2010-10-24 12:52 ` Dario Teixeira ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: philippe @ 2010-10-24 12:51 UTC (permalink / raw) To: caml-list Not an answer to your exact question, but there's something ressembling the reactor pattern in ocaml-nae, ocnae, which is a pattern to look at when that kind of use become prevalent. On Sun, Oct 24, 2010 at 12:34 PM, Jon Harrop <jon@ffconsultancy.com> wrote: > Is there a tutorial on using something like LWT for asynchronous programming > in OCaml? I'm looking for an example like an echo server that handles > clients concurrently without blocking threads, so it can handle thousands of > clients without significant performance degradation. > > -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 10:34 Asynchronous IO programming in OCaml Jon Harrop 2010-10-24 12:51 ` [Caml-list] " philippe @ 2010-10-24 12:52 ` Dario Teixeira 2010-10-24 16:33 ` oliver 2010-10-24 16:17 ` Jake Donham 2010-10-24 20:42 ` Goswin von Brederlow 3 siblings, 1 reply; 26+ messages in thread From: Dario Teixeira @ 2010-10-24 12:52 UTC (permalink / raw) To: caml-list, Jon Harrop Hi, > Is there a tutorial on using something like LWT for asynchronous > programming in OCaml? I'm looking for an example like an echo server > that handles clients concurrently without blocking threads, so it can > handle thousands of clients without significant performance degradation. Lwt comes with a manual, which should be enough to get you started. If you are looking for real-world examples, then the source of the Ocsigen server is the most obvious place to look. (Lwt was my first exposure to the monadic style of programming; this caused some head-scratching in the beginning, but after a while it became second nature. I reckon this experience might be common to other Lwt users). Hope that helps, Dario Teixeira ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 12:52 ` Dario Teixeira @ 2010-10-24 16:33 ` oliver 2010-10-24 18:50 ` Dario Teixeira 2010-10-24 21:51 ` Michael Ekstrand 0 siblings, 2 replies; 26+ messages in thread From: oliver @ 2010-10-24 16:33 UTC (permalink / raw) To: caml-list On Sun, Oct 24, 2010 at 05:52:19AM -0700, Dario Teixeira wrote: [...] > (Lwt was my first exposure to the monadic style of programming; this > caused some head-scratching in the beginning, but after a while it > became second nature. I reckon this experience might be common to > other Lwt users). [...] Can you recommend papers on monadic programming? Or how did you mastered it? Ciao, Oliver ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 16:33 ` oliver @ 2010-10-24 18:50 ` Dario Teixeira 2010-10-24 19:04 ` bluestorm 2010-10-24 20:02 ` oliver 2010-10-24 21:51 ` Michael Ekstrand 1 sibling, 2 replies; 26+ messages in thread From: Dario Teixeira @ 2010-10-24 18:50 UTC (permalink / raw) To: caml-list, oliver Hi, > Can you recommend papers on monadic programming? > Or how did you mastered it? "Mastered" it might be too strong a word... :-) Anyway, my recommendation is to simply start using it and let practice do its thing. (In my case practice came from developing Ocsigen/Eliom apps). As for books or tutorials, I would suggest taking a look at material for learning Haskell. Recently, some well-publicised Haskell books targeted at beginners have come out [1,2]. No introduction to Haskell is really complete without also discussing monads. (Reading Haskell is fairly straightforward for those familiar with Ocaml, btw). Cheers, Dario Teixeira [1] http://book.realworldhaskell.org/ [2] http://learnyouahaskell.com/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 18:50 ` Dario Teixeira @ 2010-10-24 19:04 ` bluestorm 2010-10-24 20:02 ` oliver 1 sibling, 0 replies; 26+ messages in thread From: bluestorm @ 2010-10-24 19:04 UTC (permalink / raw) To: oliver; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 1509 bytes --] A very good introduction to monads for the programmer, in my opinion, is "Monads for functional programming", by Philip Wadler http://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf If one wish to stay in OCaml country, there is a blog post by Brian Hurt : http://enfranchisedmind.com/blog/posts/a-monad-tutorial-for-ocaml/ On Sun, Oct 24, 2010 at 8:50 PM, Dario Teixeira <darioteixeira@yahoo.com> wrote: > Hi, > > > Can you recommend papers on monadic programming? > > Or how did you mastered it? > > "Mastered" it might be too strong a word... :-) Anyway, my recommendation > is to simply start using it and let practice do its thing. (In my case > practice came from developing Ocsigen/Eliom apps). > > As for books or tutorials, I would suggest taking a look at material for > learning Haskell. Recently, some well-publicised Haskell books targeted > at beginners have come out [1,2]. No introduction to Haskell is really > complete without also discussing monads. (Reading Haskell is fairly > straightforward for those familiar with Ocaml, btw). > > Cheers, > Dario Teixeira > > [1] http://book.realworldhaskell.org/ > [2] http://learnyouahaskell.com/ > > > > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > [-- Attachment #2: Type: text/html, Size: 2732 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 18:50 ` Dario Teixeira 2010-10-24 19:04 ` bluestorm @ 2010-10-24 20:02 ` oliver 1 sibling, 0 replies; 26+ messages in thread From: oliver @ 2010-10-24 20:02 UTC (permalink / raw) To: Dario Teixeira; +Cc: caml-list On Sun, Oct 24, 2010 at 11:50:16AM -0700, Dario Teixeira wrote: [...] > As for books or tutorials, I would suggest taking a look at material for > learning Haskell. heheh, that would have been my next question... ...I may refresh my Haskell, and climb the big mountain of Monads there (in Haskell-land), where I last time stopped climbing that mountain and turned to OCaml. ;) > Recently, some well-publicised Haskell books targeted > at beginners have come out [1,2]. Thanks for the hint. Ciao, Oliver ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 16:33 ` oliver 2010-10-24 18:50 ` Dario Teixeira @ 2010-10-24 21:51 ` Michael Ekstrand 1 sibling, 0 replies; 26+ messages in thread From: Michael Ekstrand @ 2010-10-24 21:51 UTC (permalink / raw) To: oliver, caml-list On 10/24/2010 11:33 AM, oliver@first.in-berlin.de wrote: > On Sun, Oct 24, 2010 at 05:52:19AM -0700, Dario Teixeira wrote: > [...] >> (Lwt was my first exposure to the monadic style of programming; this >> caused some head-scratching in the beginning, but after a while it >> became second nature. I reckon this experience might be common to >> other Lwt users). > [...] > > Can you recommend papers on monadic programming? > Or how did you mastered it? I recommend learning it by just doing Lwt programming. Once you've learned how to use Lwt, then it is much easier IMO to understand the wider world of monads. - Michael ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 10:34 Asynchronous IO programming in OCaml Jon Harrop 2010-10-24 12:51 ` [Caml-list] " philippe 2010-10-24 12:52 ` Dario Teixeira @ 2010-10-24 16:17 ` Jake Donham 2010-10-24 20:54 ` Anil Madhavapeddy 2010-10-24 20:42 ` Goswin von Brederlow 3 siblings, 1 reply; 26+ messages in thread From: Jake Donham @ 2010-10-24 16:17 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list [-- Attachment #1: Type: text/plain, Size: 513 bytes --] On Sun, Oct 24, 2010 at 3:34 AM, Jon Harrop <jon@ffconsultancy.com> wrote: > Is there a tutorial on using something like LWT for asynchronous > programming > in OCaml? I'm looking for an example like an echo server that handles > clients concurrently without blocking threads, so it can handle thousands > of > clients without significant performance degradation. > Not a tutorial, but here is a minimal TCP server in LWT: http://github.com/avsm/ocaml-cohttpserver/blob/master/server/http_tcp_server.ml Jake [-- Attachment #2: Type: text/html, Size: 934 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 16:17 ` Jake Donham @ 2010-10-24 20:54 ` Anil Madhavapeddy 2010-10-24 22:50 ` Jérémie Dimino 0 siblings, 1 reply; 26+ messages in thread From: Anil Madhavapeddy @ 2010-10-24 20:54 UTC (permalink / raw) To: Jake Donham; +Cc: Jon Harrop, caml-list [-- Attachment #1: Type: text/plain, Size: 1360 bytes --] On 24 Oct 2010, at 09:17, Jake Donham wrote: > On Sun, Oct 24, 2010 at 3:34 AM, Jon Harrop <jon@ffconsultancy.com> wrote: > Is there a tutorial on using something like LWT for asynchronous programming > in OCaml? I'm looking for an example like an echo server that handles > clients concurrently without blocking threads, so it can handle thousands of > clients without significant performance degradation. > > Not a tutorial, but here is a minimal TCP server in LWT: > > http://github.com/avsm/ocaml-cohttpserver/blob/master/server/http_tcp_server.ml This should work fine for a couple of thousand clients or so, but you'll begin to see degradation as the number of clients increase. This is because LWT internally uses select(2) to wait for file-descriptors, and not the newer kqueue(2) or epoll(2) interfaces. You can read more about the "C10K problem" here: http://www.kegel.com/c10k.html I've got a stripped-down version of LWT that uses these newer event-driven kernel interfaces (and so should be able to cross the 10,000 client barrier fairly easily), but it won't be ready for release for another month or so. Drop me an email off-list if you want to try it out earlier. Async disk I/O under Linux is annoyingly problematic if you aren't using direct I/O (and hence page/block-aligned structures). Avoid if possible :) Anil [-- Attachment #2: Type: text/html, Size: 2244 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 20:54 ` Anil Madhavapeddy @ 2010-10-24 22:50 ` Jérémie Dimino 2010-10-25 3:42 ` Markus Mottl 2010-10-25 8:42 ` Goswin von Brederlow 0 siblings, 2 replies; 26+ messages in thread From: Jérémie Dimino @ 2010-10-24 22:50 UTC (permalink / raw) To: Anil Madhavapeddy; +Cc: Jake Donham, Jon Harrop, caml-list On Sun, Oct 24, 2010 at 01:54:50PM -0700, Anil Madhavapeddy wrote: > This should work fine for a couple of thousand clients or so, but you'll > begin to see degradation as the number of clients increase. This is > because LWT internally uses select(2) to wait for file-descriptors, and > not the newer kqueue(2) or epoll(2) interfaces. You can read more about > the "C10K problem" here: [3]http://www.kegel.com/c10k.html > I've got a stripped-down version of LWT that uses these newer event-driven > kernel interfaces (and so should be able to cross the 10,000 client > barrier fairly easily), but it won't be ready for release for another > month or so. Drop me an email off-list if you want to try it out earlier. I made an implementation of lwt using libev [1]. I tested it with ocsigen and ab but the result was always a bit better with select than with epoll. That is why i did not replace select by libev in the main branch. In fact i never found the source of any benchmark comparing select to epoll on the web. > Async disk I/O under Linux is annoyingly problematic if you aren't using > direct I/O (and hence page/block-aligned structures). Avoid if possible :) The next version of Lwt will support asynchronous disk I/O by using mincore + mmap. It is already in the development version. Jérémie [1] http://www.dimino.org/cgi-bin/darcsweb.cgi?r=lwt-libev;a=summary ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 22:50 ` Jérémie Dimino @ 2010-10-25 3:42 ` Markus Mottl 2010-10-25 7:49 ` Richard Jones 2010-10-25 8:42 ` Goswin von Brederlow 1 sibling, 1 reply; 26+ messages in thread From: Markus Mottl @ 2010-10-25 3:42 UTC (permalink / raw) To: Jérémie Dimino; +Cc: Anil Madhavapeddy, Jon Harrop, caml-list On Sun, Oct 24, 2010 at 18:50, Jérémie Dimino <jeremie@dimino.org> wrote: > I made an implementation of lwt using libev [1]. I tested it with > ocsigen and ab but the result was always a bit better with select than > with epoll. That is why i did not replace select by libev in the main > branch. In fact i never found the source of any benchmark comparing > select to epoll on the web. The performance of select was also usually slightly better in my experiments than with epoll for at least a few tens of descriptors. It really depends on what your requirements are. If you are facing hundreds or even thousands of connections, you'll probably want to consider epoll. select does not scale well. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-25 3:42 ` Markus Mottl @ 2010-10-25 7:49 ` Richard Jones 0 siblings, 0 replies; 26+ messages in thread From: Richard Jones @ 2010-10-25 7:49 UTC (permalink / raw) To: Markus Mottl Cc: Jérémie Dimino, Jon Harrop, caml-list, Anil Madhavapeddy On Sun, Oct 24, 2010 at 11:42:45PM -0400, Markus Mottl wrote: > On Sun, Oct 24, 2010 at 18:50, Jérémie Dimino <jeremie@dimino.org> wrote: > > I made an implementation of lwt using libev [1]. I tested it with > > ocsigen and ab but the result was always a bit better with select than > > with epoll. That is why i did not replace select by libev in the main > > branch. In fact i never found the source of any benchmark comparing > > select to epoll on the web. > > The performance of select was also usually slightly better in my > experiments than with epoll for at least a few tens of descriptors. > It really depends on what your requirements are. If you are facing > hundreds or even thousands of connections, you'll probably want to > consider epoll. select does not scale well. I think if you really had 10,000 clients per server you'd probably want to consider your whole architecture. Putting nginx or a very cut-down Apache on the front and memcached between the webserver and the database. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 22:50 ` Jérémie Dimino 2010-10-25 3:42 ` Markus Mottl @ 2010-10-25 8:42 ` Goswin von Brederlow 2010-10-25 11:10 ` Jérémie Dimino 1 sibling, 1 reply; 26+ messages in thread From: Goswin von Brederlow @ 2010-10-25 8:42 UTC (permalink / raw) To: caml-list Jérémie Dimino <jeremie@dimino.org> writes: > On Sun, Oct 24, 2010 at 01:54:50PM -0700, Anil Madhavapeddy wrote: >> Async disk I/O under Linux is annoyingly problematic if you aren't using >> direct I/O (and hence page/block-aligned structures). Avoid if possible :) > > The next version of Lwt will support asynchronous disk I/O by using > mincore + mmap. It is already in the development version. > > Jérémie Doesn't that mean you have to do polling of all pending I/O? That seems horrible ineficient (you check them all every time) or slow I/Os can starve quick ones (you only check the oldest ones). The nice thing about libaio is that you get events as I/O completes and you get many events in one simple syscall. Doing thousands of I/O requests in parallel is trivial there. MfG Goswin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-25 8:42 ` Goswin von Brederlow @ 2010-10-25 11:10 ` Jérémie Dimino [not found] ` <AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com> 2010-10-25 15:58 ` DS 0 siblings, 2 replies; 26+ messages in thread From: Jérémie Dimino @ 2010-10-25 11:10 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list On Mon, Oct 25, 2010 at 10:42:05AM +0200, Goswin von Brederlow wrote: > Doesn't that mean you have to do polling of all pending I/O? That seems > horrible ineficient (you check them all every time) or slow I/Os can > starve quick ones (you only check the oldest ones). No. In the current implementation, when we want to read a file, we mmap it, then test it with mincore. If it is not cached in memory, we first wait a bit, test again and if it is still not available, we launch a thread which try to read the first byte of the mmapped buffer. If the file is not cached, then the time needed to launch a thread is negligible compared to the time needed to fetch data from the disk. > The nice thing about libaio is that you get events as I/O completes and > you get many events in one simple syscall. Doing thousands of I/O > requests in parallel is trivial there. AFAIK libaio is linux-specific and not impelemted for all filesystems. Moreover it transparently fallback to synchronous IOs when asynchronous IOs are not available, so there is no reliable way to test whether IOs will block or not. Jérémie ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com>]
[parent not found: <20101025143317.GB32282@aurora>]
* Re: [Caml-list] Asynchronous IO programming in OCaml [not found] ` <20101025143317.GB32282@aurora> @ 2010-10-25 15:34 ` Yaron Minsky 2010-10-25 17:26 ` Jérémie Dimino 0 siblings, 1 reply; 26+ messages in thread From: Yaron Minsky @ 2010-10-25 15:34 UTC (permalink / raw) To: Jérémie Dimino, caml-list [-- Attachment #1: Type: text/plain, Size: 1357 bytes --] Sorry, didn't meant to take this off list. Taking it back to the caml list. I don't quite understand how this whole benchmark holds together. Could you post the C code? I don't understand the differences between (1), (2) and (3) well enough to explain where the factor of 100 comes in. y On Mon, Oct 25, 2010 at 10:33 AM, Jérémie Dimino <jeremie@dimino.org> wrote: > On Mon, Oct 25, 2010 at 07:59:17AM -0400, Yaron Minsky wrote: > > What's the advantage of using mmap here?� Why not just have a few > worker > > threads available for doing file io?� Then you can use the ordinary > file > > API, and you don't need to spin up a brand new thread for each > request. > > It because the cost of switching to another thread is too big. I did the > following benchmarks some time ago (in C): i made three programs reading > a 100Mo file using a loop in the following way: > > 1 - just use read, without threads > 2 - launch a thread at each iteration > 3 - use only one thread created before entering the loop > > And the results were the following: > > - in any case (2) and (3) gave the same result, > - when the file was not in the cache, the execution time was the same > for the three programs, > - when the file was in the cache, (2) and (3) were about 100 times > slower than (1) > > Jérémie > [-- Attachment #2: Type: text/html, Size: 1748 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-25 15:34 ` Yaron Minsky @ 2010-10-25 17:26 ` Jérémie Dimino 2010-10-27 9:33 ` Goswin von Brederlow 0 siblings, 1 reply; 26+ messages in thread From: Jérémie Dimino @ 2010-10-25 17:26 UTC (permalink / raw) To: Yaron Minsky; +Cc: caml-list On Mon, Oct 25, 2010 at 11:34:41AM -0400, Yaron Minsky wrote: > I don't quite understand how this whole benchmark holds together. Could > you post the C code? I don't understand the differences between (1), (2) > and (3) well enough to explain where the factor of 100 comes in. Yes. Here is the code of the first program: ,---- | #include <sys/types.h> | #include <sys/stat.h> | #include <fcntl.h> | #include <unistd.h> | | int main() | { | int fd = open("data", O_RDONLY); | char buffer[4096]; | | while (read(fd, buffer, 4096) > 0); | | close(fd); | | return 0; | } `---- the code of the second: ,---- | #include <sys/types.h> | #include <sys/stat.h> | #include <fcntl.h> | #include <unistd.h> | #include <pthread.h> | | int fd; | char buffer[4096]; | int done = 0; | | void *callback(void* data) | { | int count = read(fd, buffer, 4096); | if (count == 0) done = 1; | return NULL; | } | | int main() | { | fd = open("data", O_RDONLY); | | while (!done) { | pthread_t thread; | pthread_create(&thread, NULL, callback, NULL); | pthread_join(thread, NULL); | } | | close(fd); | | return 0; | } `---- and the third: ,---- | #include <sys/types.h> | #include <sys/stat.h> | #include <fcntl.h> | #include <unistd.h> | #include <pthread.h> | | int fd; | char buffer[4096]; | int done = 0; | pthread_cond_t start = PTHREAD_COND_INITIALIZER; | pthread_cond_t stop = PTHREAD_COND_INITIALIZER; | pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER; | pthread_mutex_t stop_mutex = PTHREAD_MUTEX_INITIALIZER; | | void *callback(void* data) | { | while (!done) { | pthread_cond_wait(&start, &start_mutex); | | int count = read(fd, buffer, 4096); | if (count == 0) done = 1; | | pthread_mutex_lock(&stop_mutex); | pthread_cond_signal(&stop); | pthread_mutex_unlock(&stop_mutex); | } | return NULL; | } | | int main() | { | fd = open("data", O_RDONLY); | | pthread_cond_init(&start, NULL); | pthread_cond_init(&stop, NULL); | | pthread_mutex_lock(&start_mutex); | pthread_mutex_lock(&stop_mutex); | | pthread_t thread; | pthread_create(&thread, NULL, callback, NULL); | | while (!done) { | pthread_mutex_lock(&start_mutex); | pthread_cond_signal(&start); | pthread_mutex_unlock(&start_mutex); | | pthread_cond_wait(&stop, &stop_mutex); | } | | pthread_join(thread, NULL); | close(fd); | | return 0; | } `---- Jérémie ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-25 17:26 ` Jérémie Dimino @ 2010-10-27 9:33 ` Goswin von Brederlow 2010-10-27 11:18 ` Jérémie Dimino 0 siblings, 1 reply; 26+ messages in thread From: Goswin von Brederlow @ 2010-10-27 9:33 UTC (permalink / raw) To: Jeremie Dimino; +Cc: Yaron Minsky, caml-list Jérémie Dimino <jeremie@dimino.org> writes: > On Mon, Oct 25, 2010 at 11:34:41AM -0400, Yaron Minsky wrote: >> I don't quite understand how this whole benchmark holds together. Could >> you post the C code? I don't understand the differences between (1), (2) >> and (3) well enough to explain where the factor of 100 comes in. > > Yes. Here is the code of the first program: > > ,---- > | #include <sys/types.h> > | #include <sys/stat.h> > | #include <fcntl.h> > | #include <unistd.h> > | > | int main() > | { > | int fd = open("data", O_RDONLY); > | char buffer[4096]; > | > | while (read(fd, buffer, 4096) > 0); > | > | close(fd); > | > | return 0; > | } > `---- Obvious example so nothing to comment. :) > the code of the second: > > ,---- > | #include <sys/types.h> > | #include <sys/stat.h> > | #include <fcntl.h> > | #include <unistd.h> > | #include <pthread.h> > | > | int fd; > | char buffer[4096]; > | int done = 0; > | > | void *callback(void* data) > | { > | int count = read(fd, buffer, 4096); > | if (count == 0) done = 1; > | return NULL; > | } > | > | int main() > | { > | fd = open("data", O_RDONLY); > | > | while (!done) { > | pthread_t thread; > | pthread_create(&thread, NULL, callback, NULL); > | pthread_join(thread, NULL); > | } > | > | close(fd); > | > | return 0; > | } > `---- You aren't doing any multithreading. You are creating a thread and waiting for the thread to finish its read before strating a second. There are never ever 2 reads running in parallel. So all you do is add thread creation and destruction for every read to your first example. You should start multiple threads and let them read from different offsets (use pread) and only once they are all started join them all again. > and the third: > > ,---- > | #include <sys/types.h> > | #include <sys/stat.h> > | #include <fcntl.h> > | #include <unistd.h> > | #include <pthread.h> > | > | int fd; > | char buffer[4096]; > | int done = 0; > | pthread_cond_t start = PTHREAD_COND_INITIALIZER; > | pthread_cond_t stop = PTHREAD_COND_INITIALIZER; > | pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER; > | pthread_mutex_t stop_mutex = PTHREAD_MUTEX_INITIALIZER; > | > | void *callback(void* data) > | { > | while (!done) { > | pthread_cond_wait(&start, &start_mutex); > | > | int count = read(fd, buffer, 4096); > | if (count == 0) done = 1; > | > | pthread_mutex_lock(&stop_mutex); > | pthread_cond_signal(&stop); > | pthread_mutex_unlock(&stop_mutex); > | } > | return NULL; > | } > | > | int main() > | { > | fd = open("data", O_RDONLY); > | > | pthread_cond_init(&start, NULL); > | pthread_cond_init(&stop, NULL); > | > | pthread_mutex_lock(&start_mutex); > | pthread_mutex_lock(&stop_mutex); > | > | pthread_t thread; > | pthread_create(&thread, NULL, callback, NULL); > | > | while (!done) { > | pthread_mutex_lock(&start_mutex); > | pthread_cond_signal(&start); > | pthread_mutex_unlock(&start_mutex); > | > | pthread_cond_wait(&stop, &stop_mutex); > | } > | > | pthread_join(thread, NULL); > | close(fd); > | > | return 0; > | } > `---- Again no parallelism at all. Instead of thread creation and destruction you now add mutexes between the reads. Again the only expected result is a slowdown. You should create X threads at the start and then repeadately give them work (an offset to read). Only when they are all busy you wait for one of them to become idle again. > Jérémie So far you have failed to do asynchronous IO. You examples all wait for the read to complete making it synchronous and then threads are obviously slower. Also sequential reads from a file should result in sequential reads from the disk (unless the filesystem fragments the file a lot or you created it that way). That is the ideal situation for the disks and kernels read-ahead. One advantage of doing many read/writes asynchronous is that the kernel can reorder the requests and potentially merge requests. Unless you crafted your input file to be fragmented you won't see this effect in a test with sequential reads. And that effect would be the only thing making multithreaded reads faster in a test like the above. MfG Goswin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-27 9:33 ` Goswin von Brederlow @ 2010-10-27 11:18 ` Jérémie Dimino 2010-10-27 13:43 ` Goswin von Brederlow 0 siblings, 1 reply; 26+ messages in thread From: Jérémie Dimino @ 2010-10-27 11:18 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Yaron Minsky, caml-list On Wed, Oct 27, 2010 at 11:33:51AM +0200, Goswin von Brederlow wrote: > You aren't doing any multithreading. You are creating a thread and > waiting for the thread to finish its read before strating a second. > There are never ever 2 reads running in parallel. So all you do is add > thread creation and destruction for every read to your first example. Yes, i know that. The idea was just to show the overhead of context switches. > You should start multiple threads and let them read from different > offsets (use pread) and only once they are all started join them all > again. Sure, but doing this directly in Lwt raises other problems: - This means prefetching a large part of the file into the program memory. - The kernel already prefetches files from the disk, so most of the time this is just a memory copy in parallel... - How many threads do we launch in parallel ? This depends on the overhead of context switching between threads, which can not be determined at runtime. I think that the solution with mincore + mmap is better because it uses threads only when really needed. Jérémie ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-27 11:18 ` Jérémie Dimino @ 2010-10-27 13:43 ` Goswin von Brederlow 2010-10-27 15:30 ` Jérémie Dimino 0 siblings, 1 reply; 26+ messages in thread From: Goswin von Brederlow @ 2010-10-27 13:43 UTC (permalink / raw) To: caml-list Jérémie Dimino <jeremie@dimino.org> writes: > On Wed, Oct 27, 2010 at 11:33:51AM +0200, Goswin von Brederlow wrote: >> You aren't doing any multithreading. You are creating a thread and >> waiting for the thread to finish its read before strating a second. >> There are never ever 2 reads running in parallel. So all you do is add >> thread creation and destruction for every read to your first example. > > Yes, i know that. The idea was just to show the overhead of context > switches. But then you tune your benchmark to give you the result you want instead of benchmarking what normaly happens. You already have a context switch when the read returns (or any other syscall that blocks). The context switch to a thread that waits on it costs the same as switching to the main thread. When it blocked at least. >> You should start multiple threads and let them read from different >> offsets (use pread) and only once they are all started join them all >> again. > > Sure, but doing this directly in Lwt raises other problems: > > - This means prefetching a large part of the file into the program > memory. Yes. totaly. But if you want to do work asynchonously then you need to spend the memory to keep the data for multiple jobs in memory. > - The kernel already prefetches files from the disk, so most of the time > this is just a memory copy in parallel... That only works for sequential reads on a verry limited number of files (if more than one at all). If you have 1000+ clients requesting files from a webserver for example they will never be covered by the kernels read-ahead. And don't forget. Multi core systems are more and more widely spread. What is wrong with 2 cores doing memcpy twice as fast? > - How many threads do we launch in parallel ? This depends on the > overhead of context switching between threads, which can not be > determined at runtime. > > I think that the solution with mincore + mmap is better because it uses > threads only when really needed. > > Jérémie For reads I have to agree with you there. You can only hide the cost of a thread when the read blocks. By using mincore to test if a read would block first you avoid context switches where they aren't free. Unfortunately you can't use mmap() for write (efficiently). Writing to a mmap()ed file will first read in the old data from disk, then overwrite the memory and sometime later write it back to disk. There seems to be no way to tell the kernel to skip reading in a page on wirst access because one is going to completly overwrite it anyway. Do you have a different solution for writes that will avoid threads when a write won't block? And how do you restart jobs once the data has actually been commited to disk? (i.e. how do you do fsync()?) MfG Goswin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-27 13:43 ` Goswin von Brederlow @ 2010-10-27 15:30 ` Jérémie Dimino 2010-10-28 9:00 ` Goswin von Brederlow 0 siblings, 1 reply; 26+ messages in thread From: Jérémie Dimino @ 2010-10-27 15:30 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list On Wed, Oct 27, 2010 at 03:43:37PM +0200, Goswin von Brederlow wrote: > But then you tune your benchmark to give you the result you want instead > of benchmarking what normaly happens. That's true for read or write. The point i wanted to make is that in the general case, i.e. for a system call that blocks, wrapping it into a thread might not be a good solution because it degrades performances too much. > And don't forget. Multi core systems are more and more widely > spread. What is wrong with 2 cores doing memcpy twice as fast? I don't think you can read or write in parallel the RAM with current architectures, but i may be wrong. > Do you have a different solution for writes that will avoid threads when > a write won't block? And how do you restart jobs once the data has > actually been commited to disk? (i.e. how do you do fsync()?) Unfortunately no, we have no solution for write. The main problem i see with doing write in parallels with threads is to handle errors. Jérémie ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-27 15:30 ` Jérémie Dimino @ 2010-10-28 9:00 ` Goswin von Brederlow 2010-10-28 9:28 ` Jérémie Dimino 0 siblings, 1 reply; 26+ messages in thread From: Goswin von Brederlow @ 2010-10-28 9:00 UTC (permalink / raw) To: Jeremie Dimino; +Cc: Goswin von Brederlow, caml-list Jérémie Dimino <jeremie@dimino.org> writes: > On Wed, Oct 27, 2010 at 03:43:37PM +0200, Goswin von Brederlow wrote: >> But then you tune your benchmark to give you the result you want instead >> of benchmarking what normaly happens. > > That's true for read or write. The point i wanted to make is that in the > general case, i.e. for a system call that blocks, wrapping it into a > thread might not be a good solution because it degrades performances too > much. Hehe, and you have prooven yourself wrong. As you said, when the file isn't cache and the syscall actually blocks there is no time difference. The reasons being that the blokcing takes the majority of time anyway and the context switch for the thread on the read doesn't cost anything since the kernel needs to context switch anyway. :) >> And don't forget. Multi core systems are more and more widely >> spread. What is wrong with 2 cores doing memcpy twice as fast? > > I don't think you can read or write in parallel the RAM with current > architectures, but i may be wrong. NUMA has memory dedicated to each core. Each core can access its memory fast. Accessing some other cores memory on the other hand is slower (and will have to fight for access with that core). >> Do you have a different solution for writes that will avoid threads when >> a write won't block? And how do you restart jobs once the data has >> actually been commited to disk? (i.e. how do you do fsync()?) > > Unfortunately no, we have no solution for write. The main problem i see > with doing write in parallels with threads is to handle errors. > > Jérémie Nah, if you can detect the error then handing is easy. The problem is detecting. Write() itself can fail and that is easy to detect. But write() succeeding doesn't mean the data has been written successfully. You can still get an error after that which you only get on fsync(). MfG Goswin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-28 9:00 ` Goswin von Brederlow @ 2010-10-28 9:28 ` Jérémie Dimino 2010-10-28 10:11 ` Goswin von Brederlow 0 siblings, 1 reply; 26+ messages in thread From: Jérémie Dimino @ 2010-10-28 9:28 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: caml-list On Thu, Oct 28, 2010 at 11:00:59AM +0200, Goswin von Brederlow wrote: > Hehe, and you have prooven yourself wrong. As you said, when the file > isn't cache and the syscall actually blocks there is no time > difference. The reasons being that the blokcing takes the majority of > time anyway and the context switch for the thread on the read doesn't > cost anything since the kernel needs to context switch anyway. :) The kernel will always prefetch a part of the file, so the first read will launch a needed thread but not subsequent reads until the offset reach a part of the file not yet cached. And again, for other system calls, such as fstat, launching a thread is not a good solution. > NUMA has memory dedicated to each core. Each core can access its memory > fast. Accessing some other cores memory on the other hand is slower (and > will have to fight for access with that core). And so that is a bad solution for Lwt since it runs on one system thread. > Nah, if you can detect the error then handing is easy. The problem is > detecting. Write() itself can fail and that is easy to detect. But > write() succeeding doesn't mean the data has been written > successfully. You can still get an error after that which you only get > on fsync(). Take for example buffered channels, when you would have to flush the buffer, you would launch a thread calling write and let the program continue to write onto the channel, assuming that the write succeed. But if one write fails, you would get the error latter. And worst, if the program stop using the buffer after the flush, it will never get the error. Jérémie ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-28 9:28 ` Jérémie Dimino @ 2010-10-28 10:11 ` Goswin von Brederlow 0 siblings, 0 replies; 26+ messages in thread From: Goswin von Brederlow @ 2010-10-28 10:11 UTC (permalink / raw) To: Jeremie Dimino; +Cc: caml-list Jérémie Dimino <jeremie@dimino.org> writes: > On Thu, Oct 28, 2010 at 11:00:59AM +0200, Goswin von Brederlow wrote: >> NUMA has memory dedicated to each core. Each core can access its memory >> fast. Accessing some other cores memory on the other hand is slower (and >> will have to fight for access with that core). > > And so that is a bad solution for Lwt since it runs on one system > thread. Does that matter? The copying would be going on in the kernel and use multiple cores even if the read requests all came from one core I think. >> Nah, if you can detect the error then handing is easy. The problem is >> detecting. Write() itself can fail and that is easy to detect. But >> write() succeeding doesn't mean the data has been written >> successfully. You can still get an error after that which you only get >> on fsync(). > > Take for example buffered channels, when you would have to flush the > buffer, you would launch a thread calling write and let the program > continue to write onto the channel, assuming that the write succeed. But > if one write fails, you would get the error latter. And worst, if the > program stop using the buffer after the flush, it will never get the > error. > > Jérémie True. Your program will have to handle errors. Normaly I would say that a flush/sync on a channel/file has to block the thread and continue only on error or success. Otherwise you would be talking more about barriers and simulate them with flush/sync (since userspace lacks an interface for them). Doing error handling with barriers is indeed more tricky. But I was more thinking of having async writes with error reporting. Say you have 1000 "threads" that need to write something. They all call my_write(). The my_write() function issues an async write request, blocks the "thread" and continues some other "thread". Once the write request has complete the "thread" is woken up and gets a success or failure message. "thread" here doesn't mean an actual system thread. More a logical thread like handling of a specific connection. Even more async would be to pass a callback to the my_write() that gets called on success or failure and have my_write() return directly. Again error hadnling is easy since you have the callback to handle it. So my_write would look something like this: let my_write fd off str fn = ... val my_write : Unix.file_descr Int64.t string (Unix.error option -> unit) -> unit (** Write string <str> to file <fd> at offset <off> and call <fn> with None on success or Some error on failure. *) The program can then decide to continue (do something else after my_write) or to block (put the rest of the code into the callback and return). MfG Goswin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-25 11:10 ` Jérémie Dimino [not found] ` <AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com> @ 2010-10-25 15:58 ` DS 1 sibling, 0 replies; 26+ messages in thread From: DS @ 2010-10-25 15:58 UTC (permalink / raw) To: caml-list On 25 Oct 2010, at 13:10, Jérémie Dimino wrote: > AFAIK libaio is linux-specific If we are talking about aio_read() and company, these functions are available on at least FreeBSD, Irix, Solaris and Mac OS X too. I looked into this some years back and while I would say it is standard unix stuff, I do recall finding a few slight differences between platforms. FreeBSD required loading a kernel module (aio.ko IIRC) to use this, and if it wasn't loaded the program would just die horribly. It can be wrapped as an OCaml module, but expect to put many #ifdef's in your C code. Regards. -DS ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Caml-list] Asynchronous IO programming in OCaml 2010-10-24 10:34 Asynchronous IO programming in OCaml Jon Harrop ` (2 preceding siblings ...) 2010-10-24 16:17 ` Jake Donham @ 2010-10-24 20:42 ` Goswin von Brederlow 3 siblings, 0 replies; 26+ messages in thread From: Goswin von Brederlow @ 2010-10-24 20:42 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list "Jon Harrop" <jon@ffconsultancy.com> writes: > Is there a tutorial on using something like LWT for asynchronous programming > in OCaml? I'm looking for an example like an echo server that handles > clients concurrently without blocking threads, so it can handle thousands of > clients without significant performance degradation. Sorry, not a tutorial but maybe it gives you some pointers for research. What blockage do you want to remove? 1) sockets You can set sockets to non-blocking and then use Unix.select to wait for activity. But that becomes inefficient if you really want thousands of clients. This is really nothing ocaml specific but the same old select problem as in C. The solution (in C) is to use epoll and I haven't seen an ocaml binding for that yet. Anyone? Short of that mixing threads and select is an option. 2) disk I/O There are (under linux) 3 choices for async disk I/O: a) glibc -> internally threads b) librt -> internally threads c) libaio -> needs O_DIRECT to work on files I've written some bindings for libaio and that works nicely. No idea about that for windows or mac. 3) computations Here you hit a bit of a problem. Ocaml doesn't support multiple cores. (there is a multicore impementation but not standard.) So the best you can do is split computations into little chunks and do time sharing. Or offload jobs to worker threads that are not written in ocaml. What I like here is using CPS (continuation passing style). The idea is that every function gets passed an argument, the continuation, which is to be called next. If you have used Scanf.scanf then it is a bit like that. A function only returns when it blocks, in which case it stores a continuation in some queue to be continued later. A pure echo server can be done fine with just a select loop and any C example is easily translated to ocaml. Maybe try that and then look at a bigger problem and ask more specific questions. MfG Goswin ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2010-10-28 10:11 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-10-24 10:34 Asynchronous IO programming in OCaml Jon Harrop 2010-10-24 12:51 ` [Caml-list] " philippe 2010-10-24 12:52 ` Dario Teixeira 2010-10-24 16:33 ` oliver 2010-10-24 18:50 ` Dario Teixeira 2010-10-24 19:04 ` bluestorm 2010-10-24 20:02 ` oliver 2010-10-24 21:51 ` Michael Ekstrand 2010-10-24 16:17 ` Jake Donham 2010-10-24 20:54 ` Anil Madhavapeddy 2010-10-24 22:50 ` Jérémie Dimino 2010-10-25 3:42 ` Markus Mottl 2010-10-25 7:49 ` Richard Jones 2010-10-25 8:42 ` Goswin von Brederlow 2010-10-25 11:10 ` Jérémie Dimino [not found] ` <AANLkTimP77PDEChW3Yt6uUy_qxYpj6EOZWQ_==id-LBC@mail.gmail.com> [not found] ` <20101025143317.GB32282@aurora> 2010-10-25 15:34 ` Yaron Minsky 2010-10-25 17:26 ` Jérémie Dimino 2010-10-27 9:33 ` Goswin von Brederlow 2010-10-27 11:18 ` Jérémie Dimino 2010-10-27 13:43 ` Goswin von Brederlow 2010-10-27 15:30 ` Jérémie Dimino 2010-10-28 9:00 ` Goswin von Brederlow 2010-10-28 9:28 ` Jérémie Dimino 2010-10-28 10:11 ` Goswin von Brederlow 2010-10-25 15:58 ` DS 2010-10-24 20:42 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox