* [Caml-list] OT: Java Performance @ 2003-05-01 15:27 Brian Hurt 2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein 0 siblings, 1 reply; 9+ messages in thread From: Brian Hurt @ 2003-05-01 15:27 UTC (permalink / raw) To: Ocaml Mailing List Given the number of performance-related discussions in this maillist of late, I thought I'd forward this article: http://www-106.ibm.com/developerworks/java/library/j-jtp04223.html It's about Java, but I think it's still worthwhile reading for Ocaml programmers. The lesson to learn here is that performance is tricky- what you think will obviously be a problem often isn't, and what you think won't be a problem can be. Make it work correctly first, then measure performance, then enhance for performance if necessary. Another comment that applies to both languages is that if it's a glaringly obvious problem, the compiler people are probably already working on it (or possibly already solved it). Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Caml-list] comparison with C performance 2003-05-01 15:27 [Caml-list] OT: Java Performance Brian Hurt @ 2003-05-01 17:29 ` Lex Stein 2003-05-01 17:55 ` Miles Egan 2003-05-01 19:13 ` Eray Ozkural 0 siblings, 2 replies; 9+ messages in thread From: Lex Stein @ 2003-05-01 17:29 UTC (permalink / raw) To: Ocaml Mailing List Hi, A while ago I built an NFS server in OCaml (BDBFS) and the performance stunk. It was 10x slower than the BSD in-kernel NFS server for metadata operations. There was some speculation about what was causing this slowness. It could have been a number of things. So in order for my Advisor to let me continue programming in OCaml, I set out to show that it wasn't due to the choice of OCaml. The experiment consisted of 10,000 repeated RPC cycles across a 100Mbps link. An RPC cycle consists of a NULL RPC followed by an RPC with a 20 and 24 byte string that is written to a Berkeley-DB database (via DB->put) with the 20 bytes as key and 24 bytes as value. The C test never leaves C code and calls directly into the Berkeley-DB C code. The OCaml test leaves C above the RPC layer and enters the OCaml world, using the OCaml Berkeley-DB interface (the one I wrote, I know Yaron Minsky has one too) to write to the database. The following column shows the time taken by a client (the same client across all 3 test configurations) to execute 100,000 RPC cycles. I ran the experiment 15 times. The square brackets contain the standard deviation. The units are seconds. Test run at 5:00am 04-27-2003 100,000 RPC cycs C shunt: 22.87s [1.20s] OCaml shunts: bytecode: 23.87s [0.96s] native: 22.20s [0.98s] The result is that the C and OCaml native and C and OCaml bytecode are not differentiable, due to the relative standard deviations. The OCaml bytecode and native are differentiable, being more than one standard deviation away from each other. To get back to the original story: this has pointed me in the direction of improving BDBFS' performance by improving the efficiency of the directory listing and lookup algorithms rather than changing languages. OCaml seems to fare just fine against C. Lex On Thu, 1 May 2003, Brian Hurt wrote: > > Given the number of performance-related discussions in this maillist of > late, I thought I'd forward this article: > http://www-106.ibm.com/developerworks/java/library/j-jtp04223.html > > It's about Java, but I think it's still worthwhile reading for Ocaml > programmers. The lesson to learn here is that performance is tricky- what > you think will obviously be a problem often isn't, and what you think > won't be a problem can be. Make it work correctly first, then measure > performance, then enhance for performance if necessary. > > Another comment that applies to both languages is that if it's a glaringly > obvious problem, the compiler people are probably already working on it > (or possibly already solved it). > > Brian > > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein @ 2003-05-01 17:55 ` Miles Egan 2003-05-01 18:24 ` Lex Stein 2003-05-01 18:38 ` Lex Stein 2003-05-01 19:13 ` Eray Ozkural 1 sibling, 2 replies; 9+ messages in thread From: Miles Egan @ 2003-05-01 17:55 UTC (permalink / raw) To: Lex Stein; +Cc: Ocaml Mailing List [-- Attachment #1: Type: text/plain, Size: 662 bytes --] On Thu, 2003-05-01 at 10:29, Lex Stein wrote: > Hi, > > A while ago I built an NFS server in OCaml (BDBFS) and the performance > stunk. It was 10x slower than the BSD in-kernel NFS server for metadata > operations. There was some speculation about what was causing this > slowness. It could have been a number of things. So in order for my > Advisor to let me continue programming in OCaml, I set out to show that it > wasn't due to the choice of OCaml. Wouldn't you expect any userspace nfs server to be much slower than the kernel-based implementation due to the overhead of all the extra context-switching? -- Miles Egan <miles@caddr.com> [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 17:55 ` Miles Egan @ 2003-05-01 18:24 ` Lex Stein 2003-05-01 18:48 ` Miles Egan 2003-05-01 18:38 ` Lex Stein 1 sibling, 1 reply; 9+ messages in thread From: Lex Stein @ 2003-05-01 18:24 UTC (permalink / raw) To: Ocaml Mailing List Yes, there will be additional context switch costs for a user-land implementation. However, where a disk I/O costs a luxury yacht a context switch might cost a used bicycle. So I think filesystem designers are in the position of not worrying about the old bike because it's best to focus negotiating efforts on the yacht. So I guess the question on our mind was; is OCaml another luxury yacht? (With the NFS metadata operations in BDBFS there were synchronous I/O operations on the path. These will make a context switch insignificant. Consider the milliseconds required for an I/O.) To narrow the experiment to isolating the language cost, I eliminated the synchronous I/O by placing the DB->put()s outside of a transaction, with no commit. As I'm sure you realised, all of the C and OCaml Native and Bytecode experiments were run in user-land so all had additional context switches above a kernel-level implementation. However, given I/O costs in filesystems, context switch costs are insignificant. Lex On Thu, 1 May 2003, Miles Egan wrote: > On Thu, 2003-05-01 at 10:29, Lex Stein wrote: > > Hi, > > > > A while ago I built an NFS server in OCaml (BDBFS) and the performance > > stunk. It was 10x slower than the BSD in-kernel NFS server for metadata > > operations. There was some speculation about what was causing this > > slowness. It could have been a number of things. So in order for my > > Advisor to let me continue programming in OCaml, I set out to show that it > > wasn't due to the choice of OCaml. > > Wouldn't you expect any userspace nfs server to be much slower than the > kernel-based implementation due to the overhead of all the extra > context-switching? > > -- > Miles Egan <miles@caddr.com> > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 18:24 ` Lex Stein @ 2003-05-01 18:48 ` Miles Egan 0 siblings, 0 replies; 9+ messages in thread From: Miles Egan @ 2003-05-01 18:48 UTC (permalink / raw) To: Lex Stein; +Cc: Ocaml Mailing List [-- Attachment #1: Type: text/plain, Size: 891 bytes --] On Thu, 2003-05-01 at 11:24, Lex Stein wrote: > Yes, there will be additional context switch costs for a user-land > implementation. However, where a disk I/O costs a luxury yacht a context > switch might cost a used bicycle. So I think filesystem designers are in > the position of not worrying about the old bike because it's best to focus > negotiating efforts on the yacht. So I guess the question on our mind was; > is OCaml another luxury yacht? Your basic argument is reasonable, but I seem to remember one of the main reasons the previously userland Linux nfs server implementation was rewritten as a kernel-space server was to improve performance. Perhaps it's because the typical nfs server serves most of its pages out of its ram cache so context switches becomes more of an issue? Anyway, drifting off-topic for this list. -- Miles Egan <miles@caddr.com> [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 17:55 ` Miles Egan 2003-05-01 18:24 ` Lex Stein @ 2003-05-01 18:38 ` Lex Stein 2003-04-27 19:04 ` Chet Murthy 2003-05-01 19:08 ` Brian Hurt 1 sibling, 2 replies; 9+ messages in thread From: Lex Stein @ 2003-05-01 18:38 UTC (permalink / raw) To: Ocaml Mailing List My short answer is: No. Thanks Lex > Wouldn't you expect any userspace nfs server to be much slower than the > kernel-based implementation due to the overhead of all the extra > context-switching? > > -- > Miles Egan <miles@caddr.com> > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 18:38 ` Lex Stein @ 2003-04-27 19:04 ` Chet Murthy 2003-05-01 19:08 ` Brian Hurt 1 sibling, 0 replies; 9+ messages in thread From: Chet Murthy @ 2003-04-27 19:04 UTC (permalink / raw) To: Lex Stein; +Cc: Ocaml Mailing List Hmmm .. Lex, are you aware of Ensemble? Mark Hayden basically proved that if you properly manage memory and a few other things, well, you can be faster than C, unless the C program is (ahem) trivial. some more details: Mark showed that for a rather complicated network protocol stack, a CAML implementation was a *lot* faster than a highly-optimized C implementation. The key things he was able to do were: (a) since its in ML, you can be a lot more aggressive about optimization (b) effective memory-management of buffers in ML -- don't just leave it to the GC (c) serious reliance on the inliner There were a few other things, but this is a good start. Cheers, --chet-- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 18:38 ` Lex Stein 2003-04-27 19:04 ` Chet Murthy @ 2003-05-01 19:08 ` Brian Hurt 1 sibling, 0 replies; 9+ messages in thread From: Brian Hurt @ 2003-05-01 19:08 UTC (permalink / raw) To: Lex Stein; +Cc: Ocaml Mailing List >From memory, task switches on the 386 were 300-500 clock cycles. By the time of the pentium, the nominal cost of a task switch was ~50 cycles IIRC, but this did not include the costs of the TLB and cache flushs. Which raised the question of how much work you did after the TLB determining how expensive the task switch was (and does those costs count to task switch costs anyways?). I don't beleive it's been signifigantly improved since then. Task switches can be a performance problem. This is, at heart, the problem with microkernel operating systems. Done "canonically" you are task switching constantly. Especially back in the day of the Torvalds-Tannenbaum debate, the task switch cost ate you alive. The successfull microkernels generally did without memory protection- an example here is the Amiga kernel. Microkernel, granted, but no memory protection either. Several realtime OSs do the same stunt. Or, in a slightly less extreme way, you can just move more stuff into the same task, reducing the number of task switches you need to make. This is the choice Microsoft made with NT, when they moved the core graphics routines into the kernel with NT4. I find it humorously that the "microkernel" NT has graphics in the kernel, while the "monolithic kernel" Linux keeps graphics in a user space application (X). But by pulling functions into the same task space, A) you are losing a number of advantages of microkernels (for example, a misbehaving driver can now crash the kernel), and B) you are starting to look an awful lot like a monolithic kernel. The successfull kernels today are actually hybrids of monolithic and microkernel, to one extent or another, at this point. On the other hand, task switching isn't nearly the cost of I/O- disk or network- which I would expect to dominate. That being said, limiting task switches is not the only plausible optimizations an in-kernel NFS server could implement. I haven't investigated this code, but some plausible explanations include interrupt/signal latency, scheduling advantages, few address mappings/reverse mappings, etc. Brian On Thu, 1 May 2003, Lex Stein wrote: > > My short answer is: No. > > Thanks > Lex > > > Wouldn't you expect any userspace nfs server to be much slower than the > > kernel-based implementation due to the overhead of all the extra > > context-switching? > > > > -- > > Miles Egan <miles@caddr.com> > > > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Caml-list] comparison with C performance 2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein 2003-05-01 17:55 ` Miles Egan @ 2003-05-01 19:13 ` Eray Ozkural 1 sibling, 0 replies; 9+ messages in thread From: Eray Ozkural @ 2003-05-01 19:13 UTC (permalink / raw) To: Lex Stein, Ocaml Mailing List On Thursday 01 May 2003 20:29, Lex Stein wrote: > Hi, > > A while ago I built an NFS server in OCaml (BDBFS) and the performance > stunk. It was 10x slower than the BSD in-kernel NFS server for metadata > operations. There was some speculation about what was causing this > slowness. It could have been a number of things. So in order for my > Advisor to let me continue programming in OCaml, I set out to show that it > wasn't due to the choice of OCaml. Too bad you sucked at writing nfs servers ;) Don't worry system-level stuff can get frustrating and there are always a stack of architectural issues that one must be wary of. Happy hacking, -- Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr> Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-05-02 4:30 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-01 15:27 [Caml-list] OT: Java Performance Brian Hurt 2003-05-01 17:29 ` [Caml-list] comparison with C performance Lex Stein 2003-05-01 17:55 ` Miles Egan 2003-05-01 18:24 ` Lex Stein 2003-05-01 18:48 ` Miles Egan 2003-05-01 18:38 ` Lex Stein 2003-04-27 19:04 ` Chet Murthy 2003-05-01 19:08 ` Brian Hurt 2003-05-01 19:13 ` Eray Ozkural
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox