I came back to this thread thanks to Alan Schmitt's Caml Weekly News.
Here is a fairly systematic approach to the problem, contrasting Jeremy's more virtuoso approach. It works well when the whole set is statically defined in the source code, but I'm not sure about more dynamic scenarios (when a lot of the type variables involved become existentially quantified.
The idea is to first formulate a solution using logic programming. A link set can be represented as a list of accepted sinks, and I can write a small Prolog program, "insert" that takes a new link, and a link set, and traverses the link set to find the link's sink in the list, and add the link's source to the list in passing.
set([sink]) :- link(source, sink).
set(sinks) :-
set(prev_sinks),
link(source, sink),
insert(prev_sinks, source, sink, sinks).
(* either the sink we want to insert is first in the list *)
insert([sink | _], source, sink, [source, sink | sinks]).
(* or it is to be found somewhere, recursively, under some other sink [s] *)
insert([s | prev_sinks], source, sink, [s | sinks]) :-
insert(prev_sinks, source, sink, sinks).
Once you have this logic program, it is straightforward to translate it to a GADT declaration:
type 's linkset =
| Init : ('source, 'sink) link -> ('source -> ('sink -> unit)) linkset
| Insert : ('source, 'sink) link * 'set1 linkset * ('set1, 'source, 'sink, 'set2) insert -> 'set2 linkset
and ('set1, 'source, 'sink, 'set2) insert =
| Here : (('sink -> _) as 'set1, 'source, 'sink, 'source -> 'set1) insert
| Next : ('set1, 'source, 'sink, 'set2) insert -> ('a -> 'set1, 'source, 'sink, 'a -> 'set2) insert
let _ = fun (type t1 t2 t3) (lnk1 : (t2, t1) link) (lnk2 : (t3, t1) link) (lnk3 : (t3, t2) link) ->
let set = Init lnk1 in
let set = Insert (lnk2, set, Next Here) in
let set = Insert (lnk3, set, Here) in
set