From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA29223; Fri, 25 Jun 2004 19:22:17 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA29541 for ; Fri, 25 Jun 2004 19:22:16 +0200 (MET DST) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i5PHMEEV016295 for ; Fri, 25 Jun 2004 19:22:14 +0200 Received: (qmail 21190 invoked from network); 25 Jun 2004 17:22:13 -0000 Received: from dialup-4.242.105.42.dial1.seattle1.level3.net (HELO sherlock.localdomain) (shawnw@[4.242.105.42]) (envelope-sender ) by mail5.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 25 Jun 2004 17:22:12 -0000 Received: by sherlock.localdomain (Postfix, from userid 502) id 4D702FD15; Fri, 25 Jun 2004 10:25:34 -0700 (PDT) Date: Fri, 25 Jun 2004 10:25:34 -0700 From: Shawn Wagner To: caml-list@inria.fr Subject: Re: [Caml-list] Substring search on an array of strings Message-ID: <20040625172534.GH595@speakeasy.org> Mail-Followup-To: caml-list@inria.fr References: <20040625165512.GG595@speakeasy.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2i X-Miltered: at nez-perce with ID 40DC5F46.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; shawnw:01 caml-list:01 2004:99 pons:01 re-use:01 caml-list:01 shawnw:01 fernandez:01 speakeasy:01 speakeasy:01 0200,:01 olivier:02 binary:02 string:03 string:03 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Fri, Jun 25, 2004 at 07:05:13PM +0200, Diego Olivier Fernandez Pons wrote: > Bonjour, > > > If you need to find every substring in every element of the array > > then, yes, you'll have to look at every element of the array to see > > if that substring appears or not in it. > > No, or at least only in the worst case. > > In the same way the Knuth-Morris-Pratt suffix automaton allows you not > to search for every position in a string if it is not necessary, a > specific procedure for searching in an array of strings could avoid > you searching on every string. > > I will try to give some simple examples to get some intuition : > > - Suppose there are many strings that only contains a few different > letters. Then you could build first a map from subsets of the alphabet > to string indexes and only search in those strings And, of course, to build that map you have to look at every element of the array. Of course, you can then re-use the map, but it has to be generated somehow. > > - Suppose that the strings have a particular pattern (say > "processortype.memory.operatingsystem"). Then you could build indexes > on every processor type, memory or operating system, and search only > in the corresponding strings > You still have to look at the strings to find out where those corresponding strings are. Now, with something like your second example you can sort the array (Once again having to look at everything in the array) and use a binary search to find the range of strings starting with, say, athlon, without having to look at every element in that particular search. Your original questions didn't say anything about such extra information or patterns in the strings you're searching. If you want an answer you like, you have to give us enough data about the problem that we have a hope of coming up with one! Don't say: How do I find every substring in an array of strings. Say: In an array of strings of the format 'processortype.memory.operatingsystem', how can I find all strings that have 'linux' in the operatingsystem field? Aside: Why did you cc'ing caml-list when I only replied to you in that last email? (This one is going to caml-list too) -- Shawn Wagner shawnw@speakeasy.org ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners