From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA07857; Wed, 9 Apr 2003 10:15:04 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA07792 for ; Wed, 9 Apr 2003 10:15:02 +0200 (MET DST) Received: from fichte.ai.univie.ac.at (fichte.ai.univie.ac.at [131.130.174.156]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h398F1920375 for ; Wed, 9 Apr 2003 10:15:01 +0200 (MET DST) Received: from fichte.ai.univie.ac.at (markus@localhost.ai.univie.ac.at [127.0.0.1]) by fichte.ai.univie.ac.at (8.12.3/8.12.3/Debian-6.3) with ESMTP id h398ErgN019344; Wed, 9 Apr 2003 10:14:53 +0200 Received: (from markus@localhost) by fichte.ai.univie.ac.at (8.12.3/8.12.3/Debian-6.3) id h398EpsQ019343; Wed, 9 Apr 2003 10:14:51 +0200 Date: Wed, 9 Apr 2003 10:14:51 +0200 From: Markus Mottl To: Yang Shouxun Cc: caml-list@inria.fr Subject: Re: [Caml-list] stack overflow Message-ID: <20030409081451.GA18772@mail4.ai.univie.ac.at> Mail-Followup-To: Yang Shouxun , caml-list@inria.fr References: <200304091045.25094.yangsx@fltrp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200304091045.25094.yangsx@fltrp.com> User-Agent: Mutt/1.3.28i Organization: Austrian Research Institute for Artificial Intelligence X-Spam: no; 0.00; caml-list:01 shouxun:01 heared:01 dataset:01 datasets:01 passing:01 fishy:01 debugger:01 closure:01 overflow:02 continuation:02 mottl:02 native-code:02 tree:02 unix:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Wed, 09 Apr 2003, Yang Shouxun wrote: > Yes, the decision tree building function is not tail recursive. I heared > people saying C4.5 (in C) also has stack overflow problem when the training > dataset becomes very large. I can't imagine that this is the problem: either the data is well-distributed, in which case the stack size will grow roughly logarithmically with the size of the data due to partitioning. And if not, the maximum depth of the tree is limited by the number of available input variables anyway. You'd need many, many thousands of those before this becomes a problem, which even large, industrial datasets that I know do not exceed. > I don't know how to write a tail recursive version to build trees. > If there are not that many continuous attributes and the dataset is > not so large, the tree stops growing before stack overflow. The trick is to use continuation passing style (CPS): you pass a function closure (continuation) containing everything that's needed in subsequent computations. Instead of returning a result, the sub-function calls the continuation with the result, which makes the functions tail-recursive. But anyway, I think there must be some fishy operation going on. Why not use the debugger to find out? Or even better: send a link to the code :-) > Can one know the maximal number of calls before it overflow the stack? It depends: byte-code uses its own stack, which you can query using the Gc-module. Otherwise, for native-code, call the ulimit-program (Unix), which displays resource limits including stack usage or interface to the system call "getrlimit". In any case, it would be interesting to see your code. Are you going to release it under some free license? Regards, Markus Mottl -- Markus Mottl http://www.oefai.at/~markus markus@oefai.at ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners