Skip to main content.
home | support | download

Back to List Archive

Re: indexing and windows - my problem

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Feb 25 2002 - 14:57:24 GMT
At 06:41 AM 2/25/2002 -0800, Gaye Karagulle wrote:


>I am going to develop a library program in visual
>basic, that does indexing using "vector space model"
>and I need to find  the words and their corresponding
>frequencies, of each document in my database, in order
>to create vectors for each document. And stemming
>should be done meanwhile, namely, "run" "runs" and
>"running"..etc should be counted as the same word. The
>word frequencies will be used as weigts in the
>document vectors.
>
>can I create these document vectors using swish-e? if
>yes how?

Not sure I'm following what you want.  Doesn't sound like you need a search
engine.

Do you need to find the documents or are you just interested in word
frequency?

If just frequency then I'd probably just parse, stem, and tally up the
counts.  Not sure why you would need swish.  

With swish you can use some of the -T options to dump the index which will
probably give you word counts, I suppose.  -T index_words_full will tell
you the frequency of each word, but it's a lot of output to parse.



Bill Moseley
mailto:moseley@hank.org
Received on Mon Feb 25 14:58:07 2002