* ocamlgraph predecessors @ 2009-08-08 5:24 Benjamin Ylvisaker 2009-08-08 13:35 ` [Caml-list] " Edgar Friendly ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Benjamin Ylvisaker @ 2009-08-08 5:24 UTC (permalink / raw) To: caml-list I have been using ocamlgraph for a while, and have been generally happy with it. I experienced some poor performance with moderately large graphs (10-100k vertices) recently, which led me to look through the source code a little. It seems that doing anything with the predecessors of a vertex, even just getting a list of them, requires scanning through all the vertices in a graph. This seems a little crazy to me. Am I missing something? Is there some kind of work- around that gives reasonable performance for predecessor operations (i.e. not O(|V|)). Thanks, Ben ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-08 5:24 ocamlgraph predecessors Benjamin Ylvisaker @ 2009-08-08 13:35 ` Edgar Friendly 2009-08-08 20:16 ` Benjamin Ylvisaker 2009-08-25 14:22 ` Julien Signoles 2009-08-26 6:54 ` Jean-Christophe Filliâtre 2 siblings, 1 reply; 13+ messages in thread From: Edgar Friendly @ 2009-08-08 13:35 UTC (permalink / raw) To: Benjamin Ylvisaker; +Cc: caml-list Benjamin Ylvisaker wrote: > I have been using ocamlgraph for a while, and have been generally happy > with it. I experienced some poor performance with moderately large > graphs (10-100k vertices) recently, which led me to look through the > source code a little. It seems that doing anything with the > predecessors of a vertex, even just getting a list of them, requires > scanning through all the vertices in a graph. This seems a little crazy > to me. Am I missing something? Is there some kind of work-around that > gives reasonable performance for predecessor operations (i.e. not O(|V|)). > > Thanks, > Ben > What you're asking is similar to the problem of finding the predecessor of an arbitrary node in a singly-linked-list. You have no option but to scan the whole list to find its predecessor. If you had a doubly-linked-list, predecessor lookups would work easily, but that's a different data structure, with much more overhead. When you talk about "predecessors", I assume you're using a directed graph, and want to know which nodes have edges to a given node. if your graph is static, you could build lookup tables, pregenerating this information and caching it. Even with a dynamic graph, maintaining lookup tables on this info shouldn't be too hard. Does that help? E ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-08 13:35 ` [Caml-list] " Edgar Friendly @ 2009-08-08 20:16 ` Benjamin Ylvisaker 2009-08-09 14:56 ` Edgar Friendly 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Ylvisaker @ 2009-08-08 20:16 UTC (permalink / raw) To: Edgar Friendly; +Cc: caml-list On Aug 8, 2009, at 6:35 AM, Edgar Friendly wrote: > Benjamin Ylvisaker wrote: >> I have been using ocamlgraph for a while, and have been generally >> happy >> with it. I experienced some poor performance with moderately large >> graphs (10-100k vertices) recently, which led me to look through the >> source code a little. It seems that doing anything with the >> predecessors of a vertex, even just getting a list of them, requires >> scanning through all the vertices in a graph. This seems a little >> crazy >> to me. Am I missing something? Is there some kind of work-around >> that >> gives reasonable performance for predecessor operations (i.e. not >> O(|V|)). >> >> Thanks, >> Ben >> > > What you're asking is similar to the problem of finding the > predecessor > of an arbitrary node in a singly-linked-list. You have no option > but to > scan the whole list to find its predecessor. If you had a > doubly-linked-list, predecessor lookups would work easily, but > that's a > different data structure, with much more overhead. > > When you talk about "predecessors", I assume you're using a directed > graph, and want to know which nodes have edges to a given node. if > your > graph is static, you could build lookup tables, pregenerating this > information and caching it. Even with a dynamic graph, maintaining > lookup tables on this info shouldn't be too hard. > > Does that help? > E The list analogy seems like a little bit of a stretch to me. I understand the point, but I think most programmers would expect predecessor and successor operations in a generic mutable directed graph library to be symmetric in every way, including performance. I'm thinking about making a thin wrapper around ocamlgraph that makes "internal" edges in both directions with tags to distinguish them whenever the user creates an "external" edge. All the wrapper graph traversal functions would only use ocamlgraph's successor functions, and then use the tags to distinguish which edges are really supposed to point in which direction. It's a bit of a hassle, and will roughly double the amount of storage required for edges, but I need predecessor access in my application, and the O(|V|) performance is really painful for big graphs. Ben ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-08 20:16 ` Benjamin Ylvisaker @ 2009-08-09 14:56 ` Edgar Friendly 2009-08-25 14:22 ` Julien Signoles 0 siblings, 1 reply; 13+ messages in thread From: Edgar Friendly @ 2009-08-09 14:56 UTC (permalink / raw) To: Benjamin Ylvisaker; +Cc: caml-list Benjamin Ylvisaker wrote: > > On Aug 8, 2009, at 6:35 AM, Edgar Friendly wrote: > >> Benjamin Ylvisaker wrote: >>> I have been using ocamlgraph for a while, and have been generally happy >>> with it. I experienced some poor performance with moderately large >>> graphs (10-100k vertices) recently, which led me to look through the >>> source code a little. It seems that doing anything with the >>> predecessors of a vertex, even just getting a list of them, requires >>> scanning through all the vertices in a graph. This seems a little crazy >>> to me. Am I missing something? Is there some kind of work-around that >>> gives reasonable performance for predecessor operations (i.e. not >>> O(|V|)). >>> >>> Thanks, >>> Ben >>> >> >> What you're asking is similar to the problem of finding the predecessor >> of an arbitrary node in a singly-linked-list. You have no option but to >> scan the whole list to find its predecessor. If you had a >> doubly-linked-list, predecessor lookups would work easily, but that's a >> different data structure, with much more overhead. >> >> When you talk about "predecessors", I assume you're using a directed >> graph, and want to know which nodes have edges to a given node. if your >> graph is static, you could build lookup tables, pregenerating this >> information and caching it. Even with a dynamic graph, maintaining >> lookup tables on this info shouldn't be too hard. >> >> Does that help? >> E > > The list analogy seems like a little bit of a stretch to me. I > understand the point, but I think most programmers would expect > predecessor and successor operations in a generic mutable directed graph > library to be symmetric in every way, including performance. > If you'd like to call this a weakness in the current implementation, you may. > I'm thinking about making a thin wrapper around ocamlgraph that makes > "internal" edges in both directions with tags to distinguish them > whenever the user creates an "external" edge. All the wrapper graph > traversal functions would only use ocamlgraph's successor functions, and > then use the tags to distinguish which edges are really supposed to > point in which direction. It's a bit of a hassle, and will roughly > double the amount of storage required for edges, but I need predecessor > access in my application, and the O(|V|) performance is really painful > for big graphs. > > Ben > This is another solution to the slow predecessor performance, and will have different performance characteristics than predecessor lookup-tables. Note that the lookup table solution is isomorphic to building a second graph with all the arrows reversed, and using the efficient successor operations on it. Maybe this'll be easier than keeping a merged graph. Maybe not. E ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-09 14:56 ` Edgar Friendly @ 2009-08-25 14:22 ` Julien Signoles 0 siblings, 0 replies; 13+ messages in thread From: Julien Signoles @ 2009-08-25 14:22 UTC (permalink / raw) To: Edgar Friendly; +Cc: caml list Edgar Friendly a écrit : > This is another solution to the slow predecessor performance, and will > have different performance characteristics than predecessor > lookup-tables. Note that the lookup table solution is isomorphic to > building a second graph with all the arrows reversed, and using the > efficient successor operations on it. Maybe this'll be easier than > keeping a merged graph. Maybe not. This solution should be easy to implement with ocamlgraph because this operation already exists in ocamlgraph : that's the "mirror" function. See here : http://ocamlgraph.lri.fr/doc/Oper.Make.html -- Julien Signoles ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-08 5:24 ocamlgraph predecessors Benjamin Ylvisaker 2009-08-08 13:35 ` [Caml-list] " Edgar Friendly @ 2009-08-25 14:22 ` Julien Signoles 2009-08-26 14:35 ` Benjamin Ylvisaker 2009-08-26 6:54 ` Jean-Christophe Filliâtre 2 siblings, 1 reply; 13+ messages in thread From: Julien Signoles @ 2009-08-25 14:22 UTC (permalink / raw) To: Benjamin Ylvisaker; +Cc: caml-list Benjamin Ylvisaker a écrit : > I have been using ocamlgraph for a while, and have been generally happy > with it. I experienced some poor performance with moderately large > graphs (10-100k vertices) recently, which led me to look through the > source code a little. It seems that doing anything with the > predecessors of a vertex, even just getting a list of them, requires > scanning through all the vertices in a graph. This seems a little crazy > to me. Am I missing something? Is there some kind of work-around that > gives reasonable performance for predecessor operations (i.e. not O(|V|)). Actually, looking at the current implementation, accessing predecessors is worse that O(|V|): that is max(O(|V|,O(|E|)). If you use concrete (imperative directional) graphs, the simpler work-around is to use Imperative.Digraph.ConcreteBidirectional as suggested by Kevin Cheung. It uses more memory space (at worse the double) that standard concrete directional graphs. But accessing predecessors is in O(1) amortized instead of max(O(|V|,O(|E|)) and removing a vertex is in O(D*ln(D)) where D is the maximal degree of the graph instead of O(|V|*ln(|V|)). If you don't use this functor, other work-arounds have been suggested in other posts. By the way contributing to ocamlgraph by adding Imperative.Digraph.AbstractBidirectional (for instance) is still possible and welcome :o). -- Julien Signoles ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-25 14:22 ` Julien Signoles @ 2009-08-26 14:35 ` Benjamin Ylvisaker 0 siblings, 0 replies; 13+ messages in thread From: Benjamin Ylvisaker @ 2009-08-26 14:35 UTC (permalink / raw) To: Julien Signoles; +Cc: caml-list On Aug 25, 2009, at 10:22 AM, Julien Signoles wrote: > Benjamin Ylvisaker a écrit : >> I have been using ocamlgraph for a while, and have been generally >> happy with it. I experienced some poor performance with moderately >> large graphs (10-100k vertices) recently, which led me to look >> through the source code a little. It seems that doing anything >> with the predecessors of a vertex, even just getting a list of >> them, requires scanning through all the vertices in a graph. This >> seems a little crazy to me. Am I missing something? Is there some >> kind of work-around that gives reasonable performance for >> predecessor operations (i.e. not O(|V|)). > > Actually, looking at the current implementation, accessing > predecessors is worse that O(|V|): that is max(O(|V|,O(|E|)). > > If you use concrete (imperative directional) graphs, the simpler > work-around is to use Imperative.Digraph.ConcreteBidirectional as > suggested by Kevin Cheung. It uses more memory space (at worse the > double) that standard concrete directional graphs. But accessing > predecessors is in O(1) amortized instead of max(O(|V|,O(|E|)) and > removing a vertex is in O(D*ln(D)) where D is the maximal degree of > the graph instead of O(|V|*ln(|V|)). > > If you don't use this functor, other work-arounds have been > suggested in other posts. > > By the way contributing to ocamlgraph by adding > Imperative.Digraph.AbstractBidirectional (for instance) is still > possible and welcome :o). Thanks for your suggestions. I had not noticed the ConcreteBidirectional module, but it looks like it wouldn't be a drop- in replacement for me, because it's unlabeled and I need labels. If anyone is curious, here is the wrapper logic that I ended up adding: When the user wants an edge from v1 to v2 with label X, two "internal" edges get created: one from v1 to v2 with label EdgeForward (X) and one from v2 to v1 with label EdgeBackward (X). These two edges are considered equivalent by the wrapper code. Additionally, to made edge removal faster, I added a table that maps an edge to its "pair". All edge-related wrapper functions can take either of the pair of wrapper edges, and will use the source or destination vertex of the edge, depending on the Forward/Backward label. The edge removal wrapper function gets the mate of the passed in edge by looking it up in the table, and removes them both. To make vertex removal faster, I added a Boolean "removed" field to vertex labels that is set to false on vertex creation. When the wrapper vertex removal function is called, it removes all the incident edges and then just sets the vertex's removed flag to true. Vertex scanning functions clearly need to check the flag to determine whether an "internal" vertex should actually be considered part of the graph or not. If there is a lot of vertex creation and removal in an application, clearly a lot of "garbage" vertices will end up polluting the graph. When the amount of garbage gets too large, a new copy of a graph can be constructed with only the "real" vertices copied over. This is not a totally transparent operation, because if there are any tables keyed on vertices or edges, external to the graph itself, they'll get confused by the copying. This set of wrapper logic clearly adds a non-trivial amount of memory overhead: 1) The edge and vertex labels are a little bit bigger. 2) There are two edges for every externally visible edge. 3) The edge- pair table adds another O(|E|) chunk of memory. 4) An application- dependent number of garbage vertices will be floating around. I'm pretty sure all the wrapper operations are asymptotically as fast as they reasonably can be, though. If an implementation like this were done in the library itself (notice the strategic use of the passive voice) instead of as a wrapper, I'm pretty sure vertex removal could be handled more cleanly. Ben ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-08 5:24 ocamlgraph predecessors Benjamin Ylvisaker 2009-08-08 13:35 ` [Caml-list] " Edgar Friendly 2009-08-25 14:22 ` Julien Signoles @ 2009-08-26 6:54 ` Jean-Christophe Filliâtre 2 siblings, 0 replies; 13+ messages in thread From: Jean-Christophe Filliâtre @ 2009-08-26 6:54 UTC (permalink / raw) To: Benjamin Ylvisaker; +Cc: caml-list Hi, Benjamin Ylvisaker wrote: > I have been using ocamlgraph for a while, and have been generally happy > with it. I experienced some poor performance with moderately large > graphs (10-100k vertices) recently, which led me to look through the > source code a little. It seems that doing anything with the > predecessors of a vertex, even just getting a list of them, requires > scanning through all the vertices in a graph. This seems a little crazy > to me. Am I missing something? Is there some kind of work-around that > gives reasonable performance for predecessor operations (i.e. not O(|V|)). Not providing predecessors in constant time was a deliberate choice in Ocamlgraph. (A graph is basically a map which binds any vertex to the set of its successors, and that's it.) If you need efficient access to the predecessors, you have several workarounds: - implement your own graph data structure; after all, ocamlgraph was designed to clearly separate data structures and algorithms, so that you will still be able to use graph algorithms on your own graphs. - use the graph data structure Imperative.Digraph.ConcreteBidirectional, which is the only graph data structure in Ocamlgraph providing predecessors in constant time; it is actually the contribution of a user (Ted Kremenek) who experienced the same need as yourself. - memoize the results of the predecessors function (either in a modified version of the data structure or externally if your algorithm allows it). Hope this helps, -- Jean-Christophe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors
@ 2009-08-09 18:32 rixed
2009-08-10 1:59 ` Jacques Garrigue
0 siblings, 1 reply; 13+ messages in thread
From: rixed @ 2009-08-09 18:32 UTC (permalink / raw)
To: caml-list
> What you're asking is similar to the problem of finding the predecessor
> of an arbitrary node in a singly-linked-list. You have no option but to
> scan the whole list to find its predecessor. If you had a
> doubly-linked-list, predecessor lookups would work easily, but that's a
> different data structure, with much more overhead.
Much more overhead, really ?
So this is for performance reasons that all functionnal languages
promote singly-linked lists, while for instance in Linux every list
is implemented with a doubly linked list for purely ideological reasons ?
:-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-09 18:32 rixed @ 2009-08-10 1:59 ` Jacques Garrigue 2009-08-10 12:19 ` kcheung 2009-08-12 22:40 ` Francis Dupont 0 siblings, 2 replies; 13+ messages in thread From: Jacques Garrigue @ 2009-08-10 1:59 UTC (permalink / raw) To: rixed; +Cc: caml-list From: rixed@happyleptic.org > > What you're asking is similar to the problem of finding the predecessor > > of an arbitrary node in a singly-linked-list. You have no option but to > > scan the whole list to find its predecessor. If you had a > > doubly-linked-list, predecessor lookups would work easily, but that's a > > different data structure, with much more overhead. > > Much more overhead, really ? > So this is for performance reasons that all functionnal languages > promote singly-linked lists, while for instance in Linux every list > is implemented with a doubly linked list for purely ideological reasons ? > > :-) Yes indeed, much more overhead. But the source is not the fact you have to maintain backlinks, but their impact on the GC. With a GC, any modification on existing values has a cost, since you have to keep track of them independently of the value itself. Since linux has no GC, using doubly linked lists has only a very limited cost, mostly related to the extra space needed. By the way, BSD uses lots of singly-linked lists, probably because it comes from a time when there was not so much memory. Jacques Garrigue ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-10 1:59 ` Jacques Garrigue @ 2009-08-10 12:19 ` kcheung 2009-08-10 17:46 ` Benjamin Ylvisaker 2009-08-12 22:40 ` Francis Dupont 1 sibling, 1 reply; 13+ messages in thread From: kcheung @ 2009-08-10 12:19 UTC (permalink / raw) To: caml-list Perhaps something like that in ConcreteBidirectional be implemented for general Digraph so predecessors can be accessed in O(1)? If I am not mistaken, that will double the storage and running time of most of the operations. This implementation could be added as an additional variant without modifying existing code. Kevin Cheung. > From: rixed@happyleptic.org >> > What you're asking is similar to the problem of finding the >> predecessor >> > of an arbitrary node in a singly-linked-list. You have no option but >> to >> > scan the whole list to find its predecessor. If you had a >> > doubly-linked-list, predecessor lookups would work easily, but that's >> a >> > different data structure, with much more overhead. >> >> Much more overhead, really ? >> So this is for performance reasons that all functionnal languages >> promote singly-linked lists, while for instance in Linux every list >> is implemented with a doubly linked list for purely ideological reasons >> ? >> >> :-) > > Yes indeed, much more overhead. But the source is not the fact you > have to maintain backlinks, but their impact on the GC. > With a GC, any modification on existing values has a cost, since you > have to keep track of them independently of the value itself. > Since linux has no GC, using doubly linked lists has only a very > limited cost, mostly related to the extra space needed. > By the way, BSD uses lots of singly-linked lists, probably because it > comes from a time when there was not so much memory. > > Jacques Garrigue > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-10 12:19 ` kcheung @ 2009-08-10 17:46 ` Benjamin Ylvisaker 0 siblings, 0 replies; 13+ messages in thread From: Benjamin Ylvisaker @ 2009-08-10 17:46 UTC (permalink / raw) To: kcheung; +Cc: caml-list In further performance debugging, I discovered that vertex removal is also an O(|V|) operation. This makes sense given the fact that accessing predecessors is O(|V|), because the predecessor links have to be removed in order to properly remove a vertex. Stepping back, however, I still think it's a bad default for a generic mutable graph library. The hack I came up with for this one is even less elegant that the predecessors issue. I added a set of removed vertices to my wrapper graph type, and changed the wrapper vertex removal function to remove all the relevant edges and put the vertex into the removed set. Vertex member/iter/fold operations need to check the removed set to determine whether a vertex is actually in the graph or not. Periodically, when the set of removed vertices has gotten large, I make a new graph and copy in just the "real" vertices from the old graph. If the library were to include a version that maintains back links, it would incur the space overhead and (small in my opinion) time overhead of my predecessors wrapper hack. It could dispense with the vertex removal hack, though, because the back links could be deleted directly. Ben On Aug 10, 2009, at 5:19 AM, kcheung@math.carleton.ca wrote: > Perhaps something like that in ConcreteBidirectional > be implemented for general Digraph so predecessors > can be accessed in O(1)? If I am not mistaken, that > will double the storage and running time of most of > the operations. This implementation could be added > as an additional variant without modifying existing > code. > > Kevin Cheung. > >> From: rixed@happyleptic.org >>>> What you're asking is similar to the problem of finding the >>> predecessor >>>> of an arbitrary node in a singly-linked-list. You have no option >>>> but >>> to >>>> scan the whole list to find its predecessor. If you had a >>>> doubly-linked-list, predecessor lookups would work easily, but >>>> that's >>> a >>>> different data structure, with much more overhead. >>> >>> Much more overhead, really ? >>> So this is for performance reasons that all functionnal languages >>> promote singly-linked lists, while for instance in Linux every list >>> is implemented with a doubly linked list for purely ideological >>> reasons >>> ? >>> >>> :-) >> >> Yes indeed, much more overhead. But the source is not the fact you >> have to maintain backlinks, but their impact on the GC. >> With a GC, any modification on existing values has a cost, since you >> have to keep track of them independently of the value itself. >> Since linux has no GC, using doubly linked lists has only a very >> limited cost, mostly related to the extra space needed. >> By the way, BSD uses lots of singly-linked lists, probably because it >> comes from a time when there was not so much memory. >> >> Jacques Garrigue >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Caml-list] ocamlgraph predecessors 2009-08-10 1:59 ` Jacques Garrigue 2009-08-10 12:19 ` kcheung @ 2009-08-12 22:40 ` Francis Dupont 1 sibling, 0 replies; 13+ messages in thread From: Francis Dupont @ 2009-08-12 22:40 UTC (permalink / raw) To: Jacques Garrigue; +Cc: rixed, caml-list In your previous mail you wrote: By the way, BSD uses lots of singly-linked lists, probably because it comes from a time when there was not so much memory. => no, BSDs use an include ([/usr/include]/sys/queue.h) which provides C macros for singly-linked lists, singly-linked tail queues, lists and tail queues. So programmers just select the best data structure for each job (and as BSD programmers are skilled programmers they use signly-linked lists as soon as they don't need a list feature). Note this is not very bound to memory: in many cases there are some structures with better memory use, for instance lists in arrays from Python (standard C implementation). Thanks Francis.Dupont@fdupont.fr PS: just type "man queue" on a BSD (including MacOS X). If it is not enough try "man tree" (but not yet on my MacOS X, sorry)... ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-08-26 14:35 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-08 5:24 ocamlgraph predecessors Benjamin Ylvisaker 2009-08-08 13:35 ` [Caml-list] " Edgar Friendly 2009-08-08 20:16 ` Benjamin Ylvisaker 2009-08-09 14:56 ` Edgar Friendly 2009-08-25 14:22 ` Julien Signoles 2009-08-25 14:22 ` Julien Signoles 2009-08-26 14:35 ` Benjamin Ylvisaker 2009-08-26 6:54 ` Jean-Christophe Filliâtre 2009-08-09 18:32 rixed 2009-08-10 1:59 ` Jacques Garrigue 2009-08-10 12:19 ` kcheung 2009-08-10 17:46 ` Benjamin Ylvisaker 2009-08-12 22:40 ` Francis Dupont
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox