From: Shawn Wagner <shawnw@speakeasy.org>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Substring search on an array of strings
Date: Fri, 25 Jun 2004 10:25:34 -0700 [thread overview]
Message-ID: <20040625172534.GH595@speakeasy.org> (raw)
In-Reply-To: <Pine.A41.4.44.0406251856450.901162-100000@ibm1>
On Fri, Jun 25, 2004 at 07:05:13PM +0200, Diego Olivier Fernandez Pons wrote:
> Bonjour,
>
> > If you need to find every substring in every element of the array
> > then, yes, you'll have to look at every element of the array to see
> > if that substring appears or not in it.
>
> No, or at least only in the worst case.
>
> In the same way the Knuth-Morris-Pratt suffix automaton allows you not
> to search for every position in a string if it is not necessary, a
> specific procedure for searching in an array of strings could avoid
> you searching on every string.
>
> I will try to give some simple examples to get some intuition :
>
> - Suppose there are many strings that only contains a few different
> letters. Then you could build first a map from subsets of the alphabet
> to string indexes and only search in those strings
And, of course, to build that map you have to look at every element of the
array. Of course, you can then re-use the map, but it has to be generated
somehow.
>
> - Suppose that the strings have a particular pattern (say
> "processortype.memory.operatingsystem"). Then you could build indexes
> on every processor type, memory or operating system, and search only
> in the corresponding strings
>
You still have to look at the strings to find out where those corresponding
strings are. Now, with something like your second example you can sort the
array (Once again having to look at everything in the array) and use a
binary search to find the range of strings starting with, say, athlon,
without having to look at every element in that particular search.
Your original questions didn't say anything about such extra information or
patterns in the strings you're searching. If you want an answer you like,
you have to give us enough data about the problem that we have a hope of
coming up with one! Don't say: How do I find every substring in an array of
strings. Say: In an array of strings of the format
'processortype.memory.operatingsystem', how can I find all strings that have
'linux' in the operatingsystem field?
Aside: Why did you cc'ing caml-list when I only replied to you in that last
email? (This one is going to caml-list too)
--
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2004-06-25 17:22 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20040625165512.GG595@speakeasy.org>
2004-06-25 17:05 ` Diego Olivier Fernandez Pons
2004-06-25 17:25 ` Shawn Wagner [this message]
2004-06-24 7:52 Diego Olivier Fernandez Pons
2004-06-24 15:51 ` Shawn Wagner
2004-06-25 16:21 ` Diego Olivier Fernandez Pons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040625172534.GH595@speakeasy.org \
--to=shawnw@speakeasy.org \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox