Index only meta names

From: <jmruiz(at)>
Date: Wed Aug 09 2000 - 16:09:34 GMT
Hi Bill,

On 9 Aug 2000, at 5:40, Bill Moseley wrote:

> Now I'm wondering:  Would this feature offer much on the indexing side?
> Swish would still have to parse the entire document for the meta names, and
> it wouldn't necessarily reduce the size of the index, correct?  Do you
> think indexing might be faster?

Indexing should not be much more faster because the entire file
must be parsed anyway (this is the slow part).
But, while indexing, you will need much more memory with 
complete files than with only metanames. All the words and related
info are stored in memory while indexing.
For the same reason, the size of the index will be reduced. But the 
reduced amount depends on the type and amount of information in 
your files.
Anyway, I do not think that search performance will be affected for 
storing all the words unless you have thousands of big files and 
many concurrent users.

> Currently one can limit the search to meta names by specifying the names in
> the search, so I'm wondering if that feature would, well, be a feature.
> It might be nice to be able to say "Search in all the meta names" like
>   swish-e -w *=(swish or "best search engines") and have it look in all the 
> meta names.
> I'd also like to be able to do this:
>   swish-e -w meta1,meta2,meta4=(just words in those meta tabs)

You can do it if your files are in this way...
<meta1>bla bla</meta1>
<meta2>bla bla</meta2>

swish-e -w metaT=(just words in those meta tabs)

If metaT icludes all or part of your Metatags, it can fits your needs.

If your files can not be this way, the parser and several other 
functions have to be patched to get what you want.

> Of course phrasing might be tricky in those cases as swish would have to
> find only phrases within one meta name.


> Anyway, do you think a feature to only index meta names would offer much
> benefit over just limiting the search to the meta names?
Indexing only metanames is easier to add.
Searching only metanames is just a little bit harder to code. If you 
are asking me what is easier to code I will answer that indexing
is my choice. But if we want the poweful one, I think limiting the 
search is what we need (poor me!!).

BTW, I do not like the parser very much. IMO, using a clear RPN
(Reverse Polish Notation) approach should be better and clear.
As Rainer said several mails ago, the indexer is also asking for
some rewrite: Now, there are at least 4 functions making
similar tasks: countwords, countwordstr, parsecomment and 
parseMetadata. So each change has to be done four times!!

