Skip to main content.
home | support | download

Back to List Archive

Re: Index headers

From: <jmruiz(at)not-real.boe.es>
Date: Tue Sep 19 2000 - 15:48:23 GMT
Hi Bill,


On 19 Sep 2000, at 6:33, Bill Moseley wrote:

> I have a few questions here:
> 
> Stop words:
> -----------
> 
> For phrase highlighting I need to know what stop words were used to create
> the index.  I'd like a switch that would make swish print stopwords when
> printing the headers.  I'm not sure what the best swish letter would be.
> -W is too close to -w, perhaps.
> 
> swish -x -f index.file
> Stopwords: and if the a an
> 
> -x could be used to say "extended headers" and so if -x was used additional
> headers such as Wordcharacters, IgnoreFirst, and Stopwords would be included
> in the header display.
> 
> Does -x seem like a good switch letter for this?
> 

I can add some functions to the library to get these info:
- StopWords
- metaNames?
- Files?

The problem arises when you uses several index files (more than 
one -f directive): You will get one line per file.

> BTW -- if there was enough info stored in the index headers, I could see
> reindexing an index just by saying:
> 
>    swish-e -C -f index.file
> 
> 
> 
> Library version of swish-e and reindexing:
> ------------------------------------------
> 
> Does SwishOpen() really open the index file?  Or is the file opened and
> closed on each search?

The file is opened and closed in SwishOpen, just to read the 
header information.
It also opens and closes the file on each search.

> 
> I ask this as I could see a mod_perl application where SwishOpen() is
> called once on the first request, but then the index is left open for the
> life of the Apache child process.  So if the file was reindexed you might
> end up searching an old index file until that Apache child dies.
> 

In this case, each search should reread the header info!!

> Perhaps SwishSearch() could stat the index file to see if it changed on
> disk and reopen?  Or maybe it would be better for the application to stat
> the index file and look for changes.
> 

This sounds good!! it is just needs to reread the header information 
in SwishSearch and update them if the file changes, but what 
about your perl program? It will need to update the extracted 
header info after each SwishSearch.

> 
> Multiple indexes:
> -----------------
> When searching multiple indexes swish processes one index file at a time.
> You end up with headers like:
> 
>   # Search words: ( foo )
>   # Number of hits: 13
> 
> For each index file searched with results mixed in between.
> 
> Is there anyway to process multiple index files as if they are a merged
> index file?  That is, get one set of headers where Number of hits: is equal
> to the total hits of all index files (and where the -b sort would sort ALL
> the results)?
> 
> I have two index files -- on is indexed once a week, and the other is
> indexed whenever a new entry is made during the week.  I don't want to
> merge the weekly index with the incremental index every time a new entry is
> made.
> 

I was thinking on it. I think this is the way it should work. 
Remember your first question, in this case, stopwords and 
metanames may also been merged in the output. No problem
with stopwords, just a little more work to do. But metaNames can 
have different number Id's... Files can also be duplicated (like 
stopwords indeed).

> Thanks,
> 
Thanks, Bill

cu
Jose
Received on Tue Sep 19 15:48:41 2000