So while debugging a different problem I looked at my httpd logs and
realized something I'd apparently missed before. The swish-e spider
is looping over the same files dozens and dozens of times, each time
with different query arguments. Because all of the links on the site
contain a query_string containing the page they came from and a unique
id for the visitor (and a dynamic toolbar has links to every page),
this means that each page is indexed N-1 times, where N is the number
of pages on the site.
Is there an option to tell the swish spider to ignore the query string
when considering URLs? I realize that this would be inappropriate
for many sites, but it is essential for this site, so an option would
be very useful.
Net Consonance : consonant endings by net philanthropy, open source
and other randomness
Users mailing list
Received on Fri Jul 11 04:12:01 2008