Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish-e looping over the same files again and again

From: Bill Moseley <moseley(at)>
Date: Fri Jul 11 2008 - 20:13:42 GMT
On Fri, Jul 11, 2008 at 09:40:05AM -0700, Jo Rhett wrote:
> On Jul 11, 2008, at 6:43 AM, Bill Moseley wrote:
> > On Fri, Jul 11, 2008 at 01:11:56AM -0700, Jo Rhett wrote:
> >> (query string?)
> >>
> >> So while debugging a different problem I looked at my httpd logs and
> >> realized something I'd apparently missed before.  The swish-e spider
> >> is looping over the same files dozens and dozens of times, each time
> >> with different query arguments.  Because all of the links on the site
> >> contain a query_string containing the page they came from and a  
> >> unique
> >> id for the visitor (and a dynamic toolbar has links to every page),
> >> this means that each page is indexed N-1 times, where N is the number
> >> of pages on the site.
> >
> > Why don't you use cookies for session management?  Your setup kind of
> > makes it hard for browsers to do any caching.
> It does.  If the browser submits a cookie then it uses them.  If the  
> browser doesn't submit a cookie then it adds query strings to track  
> the browser.  Since spider ignores the cookies, it gets the query  
> strings added.

You can enable cookies with the spider.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:

Users mailing list
Received on Fri Jul 11 16:13:42 2008