Re: [swish-e] swish-e looping over the same files again and again

From: Bill Moseley <moseley(at)>
Date: Fri Jul 11 2008 - 13:43:53 GMT
On Fri, Jul 11, 2008 at 01:11:56AM -0700, Jo Rhett wrote:
> (query string?)
> So while debugging a different problem I looked at my httpd logs and  
> realized something I'd apparently missed before.  The swish-e spider  
> is looping over the same files dozens and dozens of times, each time  
> with different query arguments.  Because all of the links on the site  
> contain a query_string containing the page they came from and a unique  
> id for the visitor (and a dynamic toolbar has links to every page),  
> this means that each page is indexed N-1 times, where N is the number  
> of pages on the site.

Why don't you use cookies for session management?  Your setup kind of
makes it hard for browsers to do any caching.

> Is there an option to tell the swish spider to ignore the query string  
> when considering URLs?   I realize that this would be inappropriate  
> for many sites, but it is essential for this site, so an option would  
> be very useful.

Quick search of the archives turns up this:

Bill Moseley

