On Fri, Jul 11, 2008 at 09:40:05AM -0700, Jo Rhett wrote:
> On Jul 11, 2008, at 6:43 AM, Bill Moseley wrote:
> > On Fri, Jul 11, 2008 at 01:11:56AM -0700, Jo Rhett wrote:
> >> (query string?)
> >> So while debugging a different problem I looked at my httpd logs and
> >> realized something I'd apparently missed before. The swish-e spider
> >> is looping over the same files dozens and dozens of times, each time
> >> with different query arguments. Because all of the links on the site
> >> contain a query_string containing the page they came from and a
> >> unique
> >> id for the visitor (and a dynamic toolbar has links to every page),
> >> this means that each page is indexed N-1 times, where N is the number
> >> of pages on the site.
> > makes it hard for browsers to do any caching.
> It does. If the browser submits a cookie then it uses them. If the
> browser doesn't submit a cookie then it adds query strings to track
> the browser. Since spider ignores the cookies, it gets the query
> strings added.
You can enable cookies with the spider.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Users mailing list
Received on Fri Jul 11 16:13:42 2008