Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Terms in URL?

From: Kevin Porter <kev(at)>
Date: Sat Jan 19 2008 - 21:00:50 GMT
Thanks Bill, I've read the relevant parts of the docs and am working on 
it. Looks like ExtractPath (with ExtractPathDefault) is the way to go, 
because it looks like I can't search for documents where the search 
phrase is not in the meta swishdocpath?

Re-indexing is no problem. Unfortunateyl I can't simply remove the docs 
from my collection because I haven't figured out how to do that yet with 
the crawler I'm using! (I'll figure out how to do it when I'm more 
familiar with the source code).


- Kev

Bill Moseley wrote:
> On Sat, Jan 19, 2008 at 01:15:03PM +0000, Kevin Porter wrote:
>> Hi,
>> I've somehow ended up with a few duplicates in my index, and need to 
>> remove them, or filter them out of the search results. Before 
>> implementing it on the web front-end side, I'd like to know if it's 
>> possible to filter them out with a command line option to swish-e, or to 
>> remove them totally? The problem URLs contain the string 
>> "widgetType=BlogArchive". I'm not even sure if swish-e matches terms 
>> against the URL, or can be made to.
> If you use
>     MetaNames swishdocpath
> then the path will be indexed.  So then you could likely
> filter on that string.
> If you want finer control check out ExtractPath.
> But, both of those would require re-indexing so in that case you might
> as well not index the files you don't want to include in the index.

Kevin Porter
Advanced Web Construction Ltd

Users mailing list
Received on Sat Jan 19 16:00:56 2008